[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Memory Accounting] was: Re: PVH dom0 creation fails - the system freezes

On Thu, Jul 26, 2018 at 12:11 PM, Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
> On Thu, Jul 26, 2018 at 10:45:08AM +0100, George Dunlap wrote:
>> On Thu, Jul 26, 2018 at 12:07 AM, Boris Ostrovsky
>> <boris.ostrovsky@xxxxxxxxxx> wrote:
>> > On 07/25/2018 02:56 PM, Andrew Cooper wrote:
>> >> On 25/07/18 17:29, Juergen Gross wrote:
>> >>> On 25/07/18 18:12, Roger Pau Monné wrote:
>> >>>> On Wed, Jul 25, 2018 at 05:05:35PM +0300, bercarug@xxxxxxxxxx wrote:
>> >>>>> On 07/25/2018 05:02 PM, Wei Liu wrote:
>> >>>>>> On Wed, Jul 25, 2018 at 03:41:11PM +0200, Juergen Gross wrote:
>> >>>>>>> On 25/07/18 15:35, Roger Pau Monné wrote:
>> >>>>>>>>> What could be causing the available memory loss problem?
>> >>>>>>>> That seems to be Linux aggressively ballooning out memory, you go 
>> >>>>>>>> from
>> >>>>>>>> 7129M total memory to 246M. Are you creating a lot of domains?
>> >>>>>>> This might be related to the tools thinking dom0 is a PV domain.
>> >>>>>> Good point.
>> >>>>>>
>> >>>>>> In that case, xenstore-ls -fp would also be useful. The output should
>> >>>>>> show the balloon target for Dom0.
>> >>>>>>
>> >>>>>> You can also try to set the autoballoon to off in /etc/xen/xl.cfg to 
>> >>>>>> see
>> >>>>>> if it makes any difference.
>> >>>>>>
>> >>>>>> Wei.
>> >>>>> Also tried setting autoballooning off, but it had no effect.
>> >>>> This is a Linux/libxl issue that I'm not sure what's the best way to
>> >>>> solve. Linux has the following 'workaround' in the balloon driver:
>> >>>>
>> >>>> err = xenbus_scanf(XBT_NIL, "memory", "static-max", "%llu",
>> >>>>                &static_max);
>> >>>> if (err != 1)
>> >>>>     static_max = new_target;
>> >>>> else
>> >>>>     static_max >>= PAGE_SHIFT - 10;
>> >>>> target_diff = xen_pv_domain() ? 0
>> >>>>             : static_max - balloon_stats.target_pages;
>> >>> Hmm, shouldn't PVH behave the same way as PV here? I don't think
>> >>> there is memory missing for PVH, opposed to HVM's firmware memory.
>> >>>
>> >>> Adding Boris for a second opinion.
>> >
>> > (Notwithstanding Andrews' rant below ;-))
>> >
>> > I am trying to remember --- what memory were we trying not to online for
>> > HVM here?
>> My general memory of the situation is this:
>> * Balloon drivers are told to reach a "target" value for max_pages.
>> * max_pages includes all memory assigned to the guest, including video
>> ram, "special" pages, ipxe ROMs, bios ROMs from passed-through
>> devices, and so on.
>> * Unfortunately, the balloon driver doesn't know what their max_pages
>> value is and can't read it.
>> * So what the balloon drivers do at the moment (as I understand it) is
>> look at the memory *reported as RAM*, and do a calculation:
>>   visible_ram - target_max_pages = pages_in_balloon
>> You can probably see why this won't work -- the result is that the
>> guest balloons down to (target_max_pages + non_ram_pages).  This is
>> kind of messy for normal guests, but when you have a
>> populate-on-demand guest, that leaves non_ram_pages amount of PoD ram
>> in the guest.  The hypervisor then spends a huge amount of work
>> swapping the PoD pages around under the guest's feet, until it can't
>> find any more zeroed guest pages to use, and it crashes the guest.
>> The kludge we have right now is to make up a number for HVM guests
>> which is slightly larger than non_ram_pages, and tell the guest to aim
>> for *that* instead.
>> I think what we need is for the *toolstack* to calculate the size of
>> the balloon rather than the guest, and tell the balloon driver how big
>> to make its balloon, rather than the balloon driver trying to figure
>> that out on its own.
> Maybe the best option would be for the toolstack to fetch the e820
> memory map and set the target based on the size of the RAM regions in
> there for PVH Dom0? That would certainly match the expectations of the
> guest.

Right; so:
* Expecting the guest do to calculate its own balloon size was always
an architectural mistake.
* What we're tripping over now is that the hack we used to paper over
the architectural mistake for HVM doesn't apply for PVH.

We can either:
1. Extend the hack to paper it over for PVH as well
2. Fix it properly by addressing the underlying architectural mistake.

Given how long it's been that nobody's had bandwidth to do #2, I think
at the moment I think we don't really have any option except to do #1.
But we shouldn't forget that we need to do #2 at some point.

> Note that for DomUs if hvmloader (or any other component) inside of
> the guest changes the memory map it would also have to adjust the
> value in the xenstore 'target' node.

Yes, this sort of thing is part of the reason the swamp hasn't been
drained yet, just fences put up in a few places to keep the alligators


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.