[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space

  • To: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Wed, 28 Jul 2021 18:19:51 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=WpD/u0stljuACje8jkhoC0jvXnWfpDWm8aFuDyMFUKw=; b=LO+Sr1TpLkeA1rSY5aj518KMGfROdT8i4RncxOD4cZ2PnZ6jRFxCiFQM4D/N19dHEsVKMmMIMVlYhTB2RFmP5kCqpw8OUVG9dDky8djfkQ0rL6J0VDUijlJT3Y3OeD0lEepcj9GnGENIwmp0gNiNFH2DIdRGRym5hN7zoJpOWcb4vO/nay7lZtlYjUZIP/s9287qXtNH+cHh8s+eeGgDOSjHARkUREXtXF0Yc0OZ2Dk9rcFl7AFI5R1B0RETNv5UoXj+QVEIfFgZCSHEMmyIu+Mc+SKRE0GljpDJ5GaAoUgJynlR+KekNt9LGbtTAV8VJ/l4xWKE8Rse+7L2TnWd4Q==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VhSjVQGz7gRpLN0YfxvAc9VyuCQCUJbkPZZ51VSgj4DNlMkhp+CmtdXlvJV7MIrNsyRIn8p8+aAeGcHRrMES707kh4oYz8fpzJ4Y+eHkkMtIH5lk9Fcyn5iZ0m/W178J17kmYckM2EhNPfJc9TM3IHQSRpbuWm7nbPlGBBTtjuMXAq25qozNiKLbG38fmkhUEibM7Lq3F3YFRAO/G2vFUWsyMft0OQKhTxyJa+xcmJWv0wtRaBfTf+7bogj3XvrKrPF0fOPMUtYLhtnqO/F4/7mfLu8jSV+XpK/qOzPL4YN+WvSKzTx2GohHEYnRqszPWwpulbV5Qbi1sYfiPqSDkA==
  • Authentication-results: esa1.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>, Daniel De Graaf <dgdegra@xxxxxxxxxxxxx>, "Daniel P. Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, "Volodymyr Babchuk" <Volodymyr_Babchuk@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Wei Chen <Wei.Chen@xxxxxxx>
  • Delivery-date: Wed, 28 Jul 2021 17:20:19 +0000
  • Ironport-hdrordr: A9a23:krdOVKi1nL9MB+o7yKuOGOGV1HBQX6F23DAbv31ZSRFFG/FwyP rOoB1L73HJYWgqN03IwerwRZVoMkmsiaKdgLNhcotKOTON1VdAQ7sSlLcKrweQeREWs9Qtr5 uIEJIORuEYb2IK9foSiTPQe71NoKjlzEnrv5ak854Hd3APV0gU1XYeNu/tKDwQeOApP+tdKL Osou584xawc3Ueacq2QlMfWfLYmtHNnJX6JTYbGh8O8mC1/HOVwY+/NyLd8gYVUjtJz7tn23 PCiRbF6qKqtOz+4gPA1lXU849dlLLau5h+7Y23+4oowwfX+0KVjbdaKvq/VfcO0aeSAWMR4Z zxStEbTp1OAj3qDzmISFDWqnbdOX4VmgHfIBmj8DreiP28fSk9DcVZg4Jfb1/212oM1esMi5 5j7iahrJxQAgrHnCPho/7ydz8vuHaVjBMZ4LQuZ1o2a/pDVFaUl/1DwKodKuZwIAvqrI8gC+ VgF8fa+bJfdk6bdWnQui11zMWrRWlbJGbMfqEugL3d79FtpgEw86LY/r1rol4QsJYmD5VU7e XNNapl0LlIU88NdKp4QOMMW9G+BGDBSQ/FdDv6GyWrKIgXf3bW75Ln6rQ84++nPJQO0ZspgZ zEFFdVr3Q7dU7iAdCHmJdL7hfOSmOgWimF8LAV27Fp/rnnALb7OyyKT14j18OmvvUEG8XeH+ 2+PZpHasWTW1cG2bw5qDEWfqMiZUX2fPdlyerTamj+1v4jcLeaxtAzWMyjVobQLQ==
  • Ironport-sdr: N33zesdd0XuSe4ETvDVUK2rafuDaZBWIYNShFY6EPGC6yEBK0KRAZkaK88kzcSiUT0Gx+8qbWR LRAsubPMK4StrCmn8OJWE5CMtmpJ74lBQE3tS0oQotgmhZ7PeVe96GHubFILGPQTEPqV1wa1OR PxKQg2WXlpFQ8i4kf7KB8XtBsHJ1rGa3YEhhfGEEwriZzKeBTyKqhHP7Rh5fSv6NfHrGg/7gZV ff4xyx5Kjqql0H0NsumobJHeepbSC0rqgwe0mLOSCW25ofcn2s7hr8QX19WD3aCDqG8RPc9oKJ zJ0POtyFMzJNcLm8cqkKkmCj
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 28/07/2021 17:18, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>
> Add XENMEM_get_unallocated_space hypercall which purpose is to
> query hypervisor to find regions of guest physical address space
> which are unused and can be used to create grant/foreign mappings
> instead of wasting real pages from the domain memory for
> establishing these mappings. The problem with the current Linux
> on Xen on Arm behaviour is if we want to map some guest memory
> regions in advance or to perform cache mappings in the backend
> we might run out of memory in the host (see XSA-300).
> This of course, depends on the both host and guest memory sizes.
> The "unallocated space" can't be figured out precisely by
> the domain on Arm without hypervisor involvement:
> - not all device I/O regions are known by the time domain starts
>   creating grant/foreign mappings
> - the Dom0 is not aware of memory regions used for the identity
>   mappings needed for the PV drivers to work
> In both cases we might end up re-using these regions by
> a mistake. So, the hypervisor which maintains the P2M for the domain
> is in the best position to provide "unallocated space".

I'm afraid this does not improve the situation.

If a guest follows the advice from XENMEM_get_unallocated_space, and
subsequently a new IO or identity region appears, everything will
explode, because the "safe area" wasn't actually safe.

The safe range *must* be chosen by the toolstack, because nothing else
can do it safely or correctly.

Once a safe range (or ranges) has been chosen, any subsequent action
which overlaps with the ranges must be rejected, as it will violate the
guarantees provided.

Furthermore, the ranges should be made available to the guest via normal
memory map means.  On x86, this is via the E820 table, and on ARM I
presume the DTB.  There is no need for a new hypercall.

> The arch code is in charge of finding these regions and filling
> in corresponding array in new helper arch_get_unallocated_space().
> This patch implements common and Arm parts, the x86 specific bits
> are left uniplemented for now.

Leaving x86 for now is fine by me.

> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 0e07335..8a70fe0 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -1635,6 +1635,24 @@ unsigned long get_upper_mfn_bound(void)
>      return max_page - 1;
>  }
> +#define GPFN_ALIGN_TO_GB(x) (((x) + (1UL << 18) - 1) & (~((1UL << 18) - 1)))
> +
> +int arch_get_unallocated_space(struct domain *d,
> +                               struct xen_unallocated_region *regions,
> +                               unsigned int *nr_regions)
> +{
> +    /*
> +     * Everything not mapped into P2M could be treated as unused. Taking into
> +     * the account that we have a lot of unallocated space above 
> max_mapped_gfn
> +     * and in order to simplify things, just provide a single 8GB memory
> +     * region aligned to 1GB boundary for now.
> +     */
> +    regions->start_gpfn = GPFN_ALIGN_TO_GB(domain_get_maximum_gpfn(d) + 1);

To drive home my earlier point, this is racy and cannot be fixed in Xen.

You don't (and cannot) hold the p2m lock until the guest has acted upon
the information provided, so it is guaranteed stale by the time the
guest can inspect the results.  In the concurrent meantime, a wide range
of operations can change max_gpfn.

The only way to get a value the guest can use is to choose it before
hand, and have Xen enforce that the nominated range(s) remain safe.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.