Xen project Mailing List

Re: [PATCH 1/4] xen: Introduce non-broken hypercalls for the p2m pool size

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Wed, 26 Oct 2022 15:42:15 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Cx3W9P+imJMsWqJjb3/Fd9pb/bD2d42JW7SPBB3zBKw=; b=A0bHcRJx7nYYZVdgY85Vw8qazZx2EE9uxZDWjy/eEG/+6JvcTFhlGbfWhftCXeTUv9Me9g0ePDJRGt31wZcXT5sYu0HjoQBcc3rXPOvzvJOguIHdDEEAgNon1IqwD3b0ONCGGRFn5MlnyTP/a0OKa2ABcHzqZRYnazxF2GTlpvUqeE1kX4vo3daBTJ+DWfCqQOYDBUg3JPI0keOX2TdCGhH0dSXqjzRabvQgpmDV7Ec7Q3MXgmAK0Jh5362pJSZsK7ggnO6cAPWdTCHHKVPLQ/0BbfIu0BIv0wownYhCzkaeH5tR6eAnx+Qa9JwPxXnZ6pLe1219phSFn4JW27BCiQ==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZY442MERyaNQyg0iOYE17Ivc0PW6+9DARJSDuxM3BYze9yDjbrXhBTolJET1FmfFEBsVUbF7vl1GRLDTEprVSCCWj3sfBV46Sq9VK+5D97VVPQPDFF+//NEKyHQXvyv+0Dey7TIlWz4PLd7HyNNKsb2pzl/iYW7nEPbT9rMNJoV95gvSRU2P2VapqJ1N1HIKNr0t5xgaAPNFJiP5pvQ0hTFJBLl9KoyIoIgCFnHG2XPp2VYL6k4nqJIgVm5M1rlvNKOY6Jnc1AWiNNiU6gYaNL5QJfb4HiBniBbx6VoWxoII9u9MnoRm/W/ikjsPiqwqviKPqTGr5J6W1iXCSIQvsQ==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Henry Wang <Henry.Wang@xxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 26 Oct 2022 13:42:25 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 26.10.2022 12:20, Andrew Cooper wrote: > The existing XEN_DOMCTL_SHADOW_OP_{GET,SET}_ALLOCATION have problems: > > * All set_allocation() flavours have an overflow-before-widen bug when > calculating "sc->mb << (20 - PAGE_SHIFT)". > * All flavours have a granularity of of 1M. This was tolerable when the size > of the pool could only be set at the same granularity, but is broken now > that ARM has a 16-page stopgap allocation in use. > * All get_allocation() flavours round up, and in particular turn 0 into 1, > meaning the get op returns junk before a successful set op. > * The x86 flavours reject the hypercalls before the VM has vCPUs allocated, > despite the pool size being a domain property. I guess this is merely a remnant and could easily be dropped there. > * Even the hypercall names are long-obsolete. > > Implement an interface that doesn't suck, which can be first used to unit test > the behaviour, and subsequently correct a broken implementation. The old > interface will be retired in due course. > > This is part of XSA-409 / CVE-2022-33747. > > Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > --- > CC: Xen Security Team <security@xxxxxxx> > CC: Jan Beulich <JBeulich@xxxxxxxx> > CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> > CC: Wei Liu <wl@xxxxxxx> > CC: Stefano Stabellini <sstabellini@xxxxxxxxxx> > CC: Julien Grall <julien@xxxxxxx> > CC: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx> > CC: Bertrand Marquis <bertrand.marquis@xxxxxxx> > CC: Henry Wang <Henry.Wang@xxxxxxx> > CC: Anthony PERARD <anthony.perard@xxxxxxxxxx> > > Name subject to improvement. paging_{get,set}_mempool_size() for the arch helpers (in particular fitting better with them living in paging.c as well its multi-purpose use on x86) and XEN_DOMCTL_{get,set}_paging_mempool_size? Perhaps even the "mem" could be dropped? > ABI not. With the comment in the public header saying "Users of this interface are required to identify the granularity by other means" I wonder why the interface needs to be byte-granular. If the caller needs to know page size by whatever means, it can as well pass in a page count. > Future TODOs: > * x86 shadow still rounds up. This is buggy as it's a simultaneous equation > with tot_pages which varies over time with ballooning. > * x86 PV is weird. There are no toolstack interact with the shadow pool > size, but the "shadow" pool it does come into existence when logdirty (or > pv-l1tf) when first enabled. > * The shadow+hap logic is in desperate need of deduping. I have a tiny step towards this queued as post-XSA-410 work, folding HAP's and shadow's freelist, total_pages, free_pages, and p2m_pages. Here this would mean {hap,shadow}_get_allocation_bytes() could be done away with, having the logic exclusively in paging.c. > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -100,6 +100,14 @@ unsigned int p2m_get_allocation(struct domain *d) > return ROUNDUP(nr_pages, 1 << (20 - PAGE_SHIFT)) >> (20 - PAGE_SHIFT); > } > > +/* Return the size of the pool, in bytes. */ > +int arch_get_p2m_mempool_size(struct domain *d, uint64_t *size) > +{ > + *size = ACCESS_ONCE(d->arch.paging.p2m_total_pages) << PAGE_SHIFT; This may overflow for Arm32. > @@ -157,6 +165,25 @@ int p2m_set_allocation(struct domain *d, unsigned long > pages, bool *preempted) > return 0; > } > > +int arch_set_p2m_mempool_size(struct domain *d, uint64_t size) > +{ > + unsigned long pages = size >> PAGE_SHIFT; > + bool preempted = false; > + int rc; > + > + if ( (size & ~PAGE_MASK) || /* Non page-sized request? */ > + pages != (size >> PAGE_SHIFT) ) /* 32-bit overflow? */ > + return -EINVAL; Simply "(pages << PAGE_SHIFT) != size"? And then move the check into common code? > --- a/xen/arch/x86/mm/hap/hap.c > +++ b/xen/arch/x86/mm/hap/hap.c > @@ -345,6 +345,16 @@ unsigned int hap_get_allocation(struct domain *d) > + ((pg & ((1 << (20 - PAGE_SHIFT)) - 1)) ? 1 : 0)); > } > > +int hap_get_allocation_bytes(struct domain *d, uint64_t *size) > +{ > + unsigned long pages = (d->arch.paging.hap.total_pages + > + d->arch.paging.hap.p2m_pages); Unlike for Arm no ACCESS_ONCE() here? Also the addition can in principle overflow, because being done only in 32 bits. > --- a/xen/arch/x86/mm/paging.c > +++ b/xen/arch/x86/mm/paging.c > @@ -977,6 +977,45 @@ int __init paging_set_allocation(struct domain *d, > unsigned int pages, > } > #endif > > +int arch_get_p2m_mempool_size(struct domain *d, uint64_t *size) > +{ > + int rc; > + > + if ( is_pv_domain(d) ) > + return -EOPNOTSUPP; > + > + if ( hap_enabled(d) ) > + rc = hap_get_allocation_bytes(d, size); > + else > + rc = shadow_get_allocation_bytes(d, size); > + > + return rc; > +} > + > +int arch_set_p2m_mempool_size(struct domain *d, uint64_t size) > +{ > + unsigned long pages = size >> PAGE_SHIFT; > + bool preempted = false; > + int rc; > + > + if ( is_pv_domain(d) ) > + return -EOPNOTSUPP; Why? You do say "PV is weird" in a post-commit-message remark, but why do you want to retain this weirdness? Even if today the tool stack doesn't set the size when enabling log-dirty mode, I'd view this as a bug which could be addressed purely in the tool stack if this check wasn't there. > + if ( size & ~PAGE_MASK ) /* Non page-sized request? */ > + return -EINVAL; > + > + ASSERT(paging_mode_enabled(d)); Not only with the PV aspect in mind - why? It looks reasonable to me to set the pool size before enabling any paging mode. > + paging_lock(d); > + if ( hap_enabled(d) ) > + rc = hap_set_allocation(d, pages, &preempted); > + else > + rc = shadow_set_allocation(d, pages, &preempted); Potential truncation from the "unsigned long" -> "unsigned int" conversions. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.