[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Design session "grant v3"



On 22.09.22 20:43, Jan Beulich wrote:
On 22.09.2022 15:42, Marek Marczykowski-Górecki wrote:
Jürgen: today two grants formats, v1 supports only up to 16TB addresses
         v2 solves 16TB issue, introduces several more features^Wbugs
         v2 is 16 bytes per entry, v1 is 8 bytes per entry, v2 more complicated 
interface to the hypervisor
         virtio could use per-device grant table, currently virtio iommu 
device, slow interface
         v3 could be a grants tree (like iommu page tables), not flat array, 
separate trees for each grantee
         could support sharing large pages too
         easier to have more grants, continuous grant numbers etc
         two options to distingush trees (from HV PoV):
         - sharing guest ensure distinct grant ids between (multiple) trees
         - hv tells guest index under tree got registered
         v3 can be addition to v1/v2, old used for simpler cases where tree is 
an overkill
         hypervisor needs extra memory to keep refcounts - resource allocation 
discussion

How would refcounts be different from today? Perhaps I don't have a clear
enough picture yet how you envision the tree-like structure(s) to be used.

What was meant here are the additional resources the hypervisor will need for
higher grant counts of a guest. With the tree approach the number of grant
frames will be basically controlled by the guest and imposing a limit like
today wouldn't work very well (especially with the current default of only
64 grant frames).


         hv could have TLB to speedup mapping
         issue with v1/v2 - granter cannot revoke pages from uncooperating 
backend
         tree could have special page for revoking grants (redirect to that 
page)
         special domids, local to the guest, toolstack restaring backend could 
request to keep the same virtual domid
Marek:  that requires stateless (or recoverable) protocol, reusing domid 
currently causes issues
Andrei: how revoking could work
Jürgen: there needs to be hypercall, replacing and invalidating mapping (scan 
page tables?), possibly adjusting IOMMU etc; may fail, problematic for PV

Why would this be problematic for PV only? In principle any
number of mappings of a grant are possible also for PVH/HVM. So
all of them would need finding and replacing. Because of the
multiple mappings, the M2P is of no use here.

It is an additional layer in the PV case: even when mapping a foreign
page to only a single local PFN there could be multiple PTEs referencing
it.

I didn't think of the problem doing multiple mappings of the same grant.
I will look into that.

While thinking about this I started wondering in how far things
are actually working correctly right now for backends in PVH/HVM:
Any mapping of a grant is handed to p2m_add_page(), which insists
on there being exactly one mapping of any particular MFN, unless
the page is a foreign one. But how does that allow a domain to
map its own grants, e.g. when block-attaching a device locally in
Dom0? Afaict the grant-map would succeed, but the page would be
unmapped from its original GFN.

Yann:   can backend refuse revoking?
Jürgen: it shouldn't be this way, but revoke could be controlled by feature 
flag; revoke could pass scratch page per revoke call (more flexible control)

A single scratch page comes with the risk of data corruption, as all
I/O would be directed there. A sink page (for memory writes) would
likely be okay, but device writes (memory reads) can't be done from
a surrogate page.

I don't see that problem.

In case the grant is revoked due to a malicious/buggy backend, you can't
trust the I/O data anyway.

And in case the frontend is revoking the grant because the frontend is
malicious, this isn't an issue either.


Marek:  what about unmap notification?
Jürgen: revoke could even be async; ring page for unmap notifications

Marek:  downgrading mappings (rw -> ro)
Jürgen: must be careful, to not allow crashing backend

Jürgen: we should consider interface to mapping large pages ("map this area as a 
large page if backend shared it as large page")

s/backend/frontend/ I guess?

Yes.

But large pages have another downside: The backend needs to know it is a large
page, otherwise it might get confused. So while this sounds like a nice idea, it
is cumbersome in practice. But maybe someone is coming up with a nice idea how
to solve that.


Edwin:  what happens when shattering that large page?
Jürgen: on live migration pages are rebuilt anyway, can reconstruct large pages

If only we did already rebuild large pages ...

Indeed. But OTOH shattering shouldn't be a problem at least for PVH/HVM guests,
as we are speaking of gfns here. And PV guests don't have large pages anyway.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.