[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Design session "grant v3"


  • To: Juergen Gross <jgross@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 22 Sep 2022 20:43:40 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0AC7OTum4+IlzTQHZ8yDfylRjVI1NYmALQgd1LJyReo=; b=WkAzDxwYzz/wEKvUCg3WM2qKfP13xv99ANDH6xzGXustTULB2vCSRaYFtLA1A506fupPEPrVUgnqXrbJZr4CLCgdZgoH6Lpg9rW3k36k/LcPQdC9P7tNp0n/pK1G7xSxi7ospoR4Uh/rU2RJVoY0axNHtezmXQt0cmsrwQcwfCmvepVUttt38kqItgSWbzsRa2LbpFqIcOGyji3FJHOg/FpNcN23bQmfOPULoyclCJ+vkjFdF4THQtZSb35DoqGrgsefJjFgRpqU/rGVIp4erp4P9s1CUG1TYEUUNuDmcyVoM63VMpE8x9fl83tBxocMnzLzZoEPy62TnXQsxCAGsA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gsDb2jZArLOM3KzVZNnl4lX2/pVmOJg+3WR1R2TVZsT1rZp8mpkDe0y2g2CYUlMLfNYau4jy/oqIfslWkOKo+XZgxIbU//ImSYnNE71H2ka/g5HDHwEdlWYX+gJX8WD01JR5a+BRtpU4OcM5TJSb2ElFfMkfTE0c7riuw7GB8qrE6ftAFtPl1I04+0PCZcLqjS1Q/rnZMq8AF1m6p7zyDZ9JALoYNiNtYCFvXq+mbLjmp65bNaU2fsMUpUumAPlKcUxqWLWAtQiLYsVIvLJK0T/+a6Crql2Ry0VzdjzSM8M82gohgLMEmOSdn9d5D+KkfJhcGevfl3rbjTUi/9TSdw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 22 Sep 2022 18:43:53 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22.09.2022 15:42, Marek Marczykowski-Górecki wrote:
> Jürgen: today two grants formats, v1 supports only up to 16TB addresses
>         v2 solves 16TB issue, introduces several more features^Wbugs
>         v2 is 16 bytes per entry, v1 is 8 bytes per entry, v2 more 
> complicated interface to the hypervisor
>         virtio could use per-device grant table, currently virtio iommu 
> device, slow interface
>         v3 could be a grants tree (like iommu page tables), not flat array, 
> separate trees for each grantee
>         could support sharing large pages too
>         easier to have more grants, continuous grant numbers etc
>         two options to distingush trees (from HV PoV):
>         - sharing guest ensure distinct grant ids between (multiple) trees
>         - hv tells guest index under tree got registered
>         v3 can be addition to v1/v2, old used for simpler cases where tree is 
> an overkill
>         hypervisor needs extra memory to keep refcounts - resource allocation 
> discussion

How would refcounts be different from today? Perhaps I don't have a clear
enough picture yet how you envision the tree-like structure(s) to be used.

>         hv could have TLB to speedup mapping
>         issue with v1/v2 - granter cannot revoke pages from uncooperating 
> backend
>         tree could have special page for revoking grants (redirect to that 
> page)
>         special domids, local to the guest, toolstack restaring backend could 
> request to keep the same virtual domid
> Marek:  that requires stateless (or recoverable) protocol, reusing domid 
> currently causes issues
> Andrei: how revoking could work
> Jürgen: there needs to be hypercall, replacing and invalidating mapping (scan 
> page tables?), possibly adjusting IOMMU etc; may fail, problematic for PV

Why would this be problematic for PV only? In principle any
number of mappings of a grant are possible also for PVH/HVM. So
all of them would need finding and replacing. Because of the
multiple mappings, the M2P is of no use here.

While thinking about this I started wondering in how far things
are actually working correctly right now for backends in PVH/HVM:
Any mapping of a grant is handed to p2m_add_page(), which insists
on there being exactly one mapping of any particular MFN, unless
the page is a foreign one. But how does that allow a domain to
map its own grants, e.g. when block-attaching a device locally in
Dom0? Afaict the grant-map would succeed, but the page would be
unmapped from its original GFN.

> Yann:   can backend refuse revoking?
> Jürgen: it shouldn't be this way, but revoke could be controlled by feature 
> flag; revoke could pass scratch page per revoke call (more flexible control)

A single scratch page comes with the risk of data corruption, as all
I/O would be directed there. A sink page (for memory writes) would
likely be okay, but device writes (memory reads) can't be done from
a surrogate page.

> Marek:  what about unmap notification?
> Jürgen: revoke could even be async; ring page for unmap notifications
> 
> Marek:  downgrading mappings (rw -> ro)
> Jürgen: must be careful, to not allow crashing backend
> 
> Jürgen: we should consider interface to mapping large pages ("map this area 
> as a large page if backend shared it as large page")

s/backend/frontend/ I guess?

> Edwin:  what happens when shattering that large page?
> Jürgen: on live migration pages are rebuilt anyway, can reconstruct large 
> pages

If only we did already rebuild large pages ...

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.