[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
From: Julien Grall <julien@xxxxxxx>
Date: Wed, 28 Jul 2021 20:53:22 +0100
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>, Daniel De Graaf <dgdegra@xxxxxxxxxxxxx>, "Daniel P. Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Wei Chen <Wei.Chen@xxxxxxx>
Delivery-date: Wed, 28 Jul 2021 19:53:50 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>



On 28/07/2021 20:00, Andrew Cooper wrote:

On 28/07/2021 18:27, Julien Grall wrote:

Hi Andrew,

On 28/07/2021 18:19, Andrew Cooper wrote:

On 28/07/2021 17:18, Oleksandr Tyshchenko wrote:

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>

Add XENMEM_get_unallocated_space hypercall which purpose is to
query hypervisor to find regions of guest physical address space
which are unused and can be used to create grant/foreign mappings
instead of wasting real pages from the domain memory for
establishing these mappings. The problem with the current Linux
on Xen on Arm behaviour is if we want to map some guest memory
regions in advance or to perform cache mappings in the backend
we might run out of memory in the host (see XSA-300).
This of course, depends on the both host and guest memory sizes.

The "unallocated space" can't be figured out precisely by
the domain on Arm without hypervisor involvement:
- not all device I/O regions are known by the time domain starts
    creating grant/foreign mappings
- the Dom0 is not aware of memory regions used for the identity
    mappings needed for the PV drivers to work
In both cases we might end up re-using these regions by
a mistake. So, the hypervisor which maintains the P2M for the domain
is in the best position to provide "unallocated space".


I'm afraid this does not improve the situation.

If a guest follows the advice from XENMEM_get_unallocated_space, and
subsequently a new IO or identity region appears, everything will
explode, because the "safe area" wasn't actually safe.

The safe range *must* be chosen by the toolstack, because nothing else
can do it safely or correctly.


The problem is how do you size it? In particular, a backend may map
multiple time the same page (for instance if the page is granted twice).


The number of mapped grants is limited by the size of the maptrack table
in Xen, which is a toolstack input to the domaincreate hypercall.
Therefore, the amount of space required is known and bounded.

There are a handful of other frames required in the current ABI (shared
info, vcpu info, etc).

The areas where things do become fuzzy is things like foreign mappings,
acquire_resource, etc for the control domain, which are effectively
unbounded from the domain's point of view.

For those, its entirely fine to say "here 128G of safe mapping space" or
so.  Even the quantity of mapping dom0 can make is limited by the shadow
memory pool and the number of pagetables Xen is willing to expend on the
second stage translation tables.


FWIW, on Arm, we don't have shadow memory pool.

Anyway, it should be possible to give a 128GB "safe range" and let Xendeal with it.


Once a safe range (or ranges) has been chosen, any subsequent action
which overlaps with the ranges must be rejected, as it will violate the
guarantees provided.

Furthermore, the ranges should be made available to the guest via normal
memory map means.  On x86, this is via the E820 table, and on ARM I
presume the DTB.  There is no need for a new hypercall.


Device-Tree only works if you have a guest using it. How about ACPI?


ACPI inherits E820 from x86 (its a trivial format), and UEFI was also
based on it.

But whichever...  All firmware interfaces have a memory map.

This will be UEFI memory map. However, I am a bit confused how we cantell the OS the region will be used for grant/foreign mapping. Is itpossible to reserved a new type?

To me the hypercall solution at least:
   1) Avoid to have to define the region on every single firmware table


There is only ever one.

Why? I could forsee an interest to use the host memory map and thereforewe may need to find a few holes for safe regions to use.

   2) Allow to easily extend after the guest run


The safe ranges can't be changed (safely).  This is the same problem as
needing to know things like your PCI apertures ahead of time, or where
the DIMM hotplug regions are.

Having the guest physmap be actually dynamic is the cause of so many
bugs (inc security) and misfeatures in Xen.  Guests cannot and do no
cope with things being fully dynamic, because that's not how real
hardware works.  What you get is layers and layers of breakage on top of
each other, rather than a working system.

I would not call it "fully dynamic". Xen can easily know whether aregion has ever be allocated before. So long the region has never beenallocated, then it should be fine. In fact...


The size of mapping space is a limit, just like maxphysaddr, or the PCI
apertures, or MMCFG space, etc.  You can make it large by default (as it
doesn't consume resource when not being used), but any guest OS isn't
going to tolerate it morphing during runtime.

... I believe the OS may be not aware of the hotplug region until it isactually used.

Anyway, I suggested this approach a few months ago to Oleksandr (see[1]) which BTW you were CCed on ;). My take was that Xen knows about theunallocated space and therefore can make an informed decision withouthaving to allocate insanely large region.

Now if you think that's fine (IIRC Stefano had a preference for that aswell). Then we can use the firmware table (assuming we can solve theACPI question).

At the end of the day, this is not really the interesting bit of theproblem. What matter if the OS part where hopefully Linux will be ableto use normally the RAM. We may even be able to fix XSA-300!


Cheers,

[1] <YJ3jlGSxs60Io+dp@xxxxxxxxxxxxxxxx>

--
Julien Grall

Follow-Ups:
- Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space
  - From: Oleksandr

References:
- [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space
  - From: Oleksandr Tyshchenko
- Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space
  - From: Andrew Cooper
- Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space
  - From: Julien Grall
- Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space
  - From: Andrew Cooper

Prev by Date: Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space
Next by Date: Re: [XEN PATCH] tools/xl: Add device_model_stubdomain_init_seclabel option to xl.cfg
Previous by thread: Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space
Next by thread: Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.