On Thu, Jul 26, 2018 at 12:07 AM, Boris Ostrovsky
<boris.ostrovsky@xxxxxxxxxx> wrote:
My general memory of the situation is this:

* Balloon drivers are told to reach a "target" value for max_pages.
* max_pages includes all memory assigned to the guest, including video
ram, "special" pages, ipxe ROMs, bios ROMs from passed-through
devices, and so on.
* Unfortunately, the balloon driver doesn't know what their max_pages
value is and can't read it.
* So what the balloon drivers do at the moment (as I understand it) is
look at the memory *reported as RAM*, and do a calculation:
  visible_ram - target_max_pages = pages_in_balloon

You can probably see why this won't work -- the result is that the
guest balloons down to (target_max_pages + non_ram_pages).  This is
kind of messy for normal guests, but when you have a
populate-on-demand guest, that leaves non_ram_pages amount of PoD ram
in the guest.  The hypervisor then spends a huge amount of work
swapping the PoD pages around under the guest's feet, until it can't
find any more zeroed guest pages to use, and it crashes the guest.

The kludge we have right now is to make up a number for HVM guests
which is slightly larger than non_ram_pages, and tell the guest to aim
for *that* instead.

I think what we need is for the *toolstack* to calculate the size of
the balloon rather than the guest, and tell the balloon driver how big
to make its balloon, rather than the balloon driver trying to figure
that out on its own.

We also need to get a handle on making the allocation and tracking of
all the random "non-RAM" pages allocated to a guest; but that's a
slightly different region of the swamp.


