[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] initial ballooning amount on HVM+PoD



>>> On 17.01.14 at 17:13, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> wrote:
> On 01/17/2014 11:03 AM, Jan Beulich wrote:
>>>>> On 17.01.14 at 16:54, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> wrote:
>>> On 01/17/2014 09:33 AM, Jan Beulich wrote:
>>>> While looking into JÃrgen's issue with PoD setup causing soft lockups
>>>> in Dom0 I realized that what I did in linux-2.6.18-xen.hg's c/s
>>>> 989:a7781c0a3b9a ("xen/balloon: fix balloon driver accounting for
>>>> HVM-with-PoD case") just doesn't work - the BUG_ON() added there
>>>> triggers as soon as there's a reasonable amount of excess memory.
>>>> And that is despite me knowing that I spent significant amounts of
>>>> in testing that change - I must have tested something else than
>>>> finally got checked in, or must have screwed up in some other way.
>>>> Extremely embarrassing...
>>>>
>>>> In the course of finding a proper solution I soon stumbled across
>>>> upstream's c275a57f5e ("xen/balloon: Set balloon's initial state to
>>>> number of existing RAM pages"), and hence went ahead and
>>>> compared three different calculations for initial bs.current_pages:
>>>>
>>>> (a) upstream's (open coding get_num_physpages(), as I did this on
>>>>       an older kernel)
>>>> (b) plain old num_physpages (equaling the maximum RAM PFN)
>>>> (c) XENMEM_get_pod_target output (with the hypervisor altered
>>>>       to not refuse this for a domain doing it on itself)
>>>>
>>>> The fourth (original) method, using totalram_pages, was already
>>>> known to result in the driver not ballooning down enough, and
>>>> hence setting up the domain for an eventual crash when the PoD
>>>> cache runs empty.
>>>>
>>>> Interestingly, (a) too results in the driver not ballooning down
>>>> enough - there's a gap of exactly as many pages as are marked
>>>> reserved below the 1Mb boundary. Therefore aforementioned
>>>> upstream commit is presumably broken.
>>>>
>>>> Short of a reliable (and ideally architecture independent) way of
>>>> knowing the necessary adjustment value, the next best solution
>>>> (not ballooning down too little, but also not ballooning down much
>>>> more than necessary) turns out to be using the minimum of (b)
>>>> and (c): When the domain only has memory below 4Gb, (b) is
>>>> more precise, whereas in the other cases (c) gets closest.
>>> I am not sure I understand why (b) would be the right answer for
>>> less-than-4G guests. The reason for c275a57f5e patch was that max_pfn
>>> includes MMIO space (which is not RAM) and thus the driver will
>>> unnecessarily balloon down that much memory.
>> max_pfn/num_physpages isn't that far off for guest with less than
>> 4Gb, the number calculated from the PoD data is a little worse.
> 
> For a 4G guest it's 65K pages that are ballooned down so it's not 
> insignificant.

I didn't say (in the original mail) 4Gb guest - I said guest with
memory only below 4Gb. So yes, for 4Gb guest this is unacceptably
high, ...

> And it you are increasing MMIO size (something that we had to do here) 
> it gets progressively worse.

... and growing with MMIO size, hence the PoD data yields better
results in that case.

>>>> Question now is: Considering that (a) is broken (and hard to fix)
>>>> and (b) is in presumably a large part of practical cases leading to
>>>> too much ballooning down, shouldn't we open up
>>>> XENMEM_get_pod_target for domains to query on themselves?
>>>> Alternatively, can anyone see another way to calculate a
>>>> reasonably precise value?
>>> I think hypervisor query is a good thing although I don't know whether
>>> exposing PoD-specific data (count and entry_count) to the guest is
>>> necessary. It's probably OK (or we can set these fields to zero for
>>> non-privileged domains).
>> That's pointless then - if no useful data is provided through the
>> call to non-privileged domains, we can as well keep it erroring for
>> them.
>>
> 
> I thought that are after d->tot_pages, no?

That can be obtained through another XENMEM_ operation. No,
what is needed is the difference between PoD entries and PoD
cache (which then needs to be added to tot_pages).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.