[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: PoD issue

What seems likely to me is that Xen (setting the PoD target) and the balloon driver (allocating memory) have a different way of calculating the amount of guest memory. So the balloon driver thinks it's done handing memory back to Xen when there are still more outstanding PoD entries than there are entries in the PoD memory pool. What balloon driver are you using? Can you let me know max_mem, target, and what the balloon driver has reached before calling it quits? (Although 13,000 pages is an awful lot to be off by: 54 MB...)

Re what "B" means, below is a rather long-winded explanation that will, hopefully, be clear. :-)

Hmm, I'm not sure what the guest balloon driver's "Current allocation" means either. :-) Does it mean, "Size of the current balloon" (i.e., starts at 0 and grows as the balloon driver allocates guest pages and hands them back to Xen)? Or does it mean, "Amount of memory guest currently has allocated to it" (i.e., starts at static_max and goes down as the balloon driver allocates guest pages and hands them back to Xen)?

In the comment, B does *not* mean "the size of the balloon" (i.e., the number of pages allocated from the guest OS by the balloon driver). Rather, B means "Amount of memory the guest currently thinks it has allocated to it." B starts at M at boot. The balloon driver will try to make B=T by inflating the size of the balloon to M-T. Clear as mud?

Let's make a concrete example. Let's say static max is 409,600K (100,000 pages).
M=100,000 and doesn't change.  Let's say that T is 50,000.

At boot:
B == M == 100,000.
P == 0
tot_pages = pod.count == 50,000
entry_count == 100,000

Thus things hold:
* 0 <= P (0) <= T (50,000) <= B (100,000) <= M (100,000)
* entry_count (100,000) == B (100,000) - P (0)
* tot_pages (50,000) == P (0) + pod.count (50,000)

As the guest boots, pages will be populated from the cache; P increases, but entry_count and pod.count decrease. Let's say that 25,000 pages get allocated just before the balloon driver runs:

* 0 <= P (25,000) <= T (50,000) <= B(100,000) <= M (100,000)
* entry_count (75,000) == B (100,000) - P (25,000)
* tot_pages (50,000) == P (25,000) + pod.count (25,000)

Then the balloon driver runs. It should try to allocate 50,000 pages total (M - T). For simplicity, let's say that the balloon driver only allocates un-allocated pages. When it's halfway there, having allocated 25,000 pages, things look like this:

* 0 <= P (25,000) <= T (50,000) <= B (75,000) <= M (100,000)
* entry_count (50,000) == B (75,000) - P (25,000)
* tot_pages (50,000) == P (25,000) + pod.count (25,000)

Eventually the balloon driver should reach its new target of 50,000, having allocated 50,000 pages:

* 0 <= P (25,000) <= T (50,000) <= B (50,000) <= M(100,000)
* entry_count(25,000) == B(50,000) - P (25,000)
* tot_pages (50,000) == P(25,000) + pod.count(25,000)

The reason for the logic is so that we can do the Right Thing if, after the balloon driver has ballooned half way (to 75,000 pages), the target is changed. If you're not changing the target before the balloon driver has reached its target,


Jan Beulich wrote:

before diving deeply into the PoD code, I hope you have some idea that
might ease the debugging that's apparently going to be needed.

Following the comment immediately before p2m_pod_set_mem_target(),
there's an apparent inconsistency with the accounting: While the guest
in question properly balloons down to its intended setting (1G, with a
maxmem setting of 2G), the combination of the equations

  d->arch.p2m->pod.entry_count == B - P
  d->tot_pages == P + d->arch.p2m->pod.count

doesn't hold (provided I interpreted the meaning of B correctly - I
took this from the guest balloon driver's "Current allocation" report,
converted to pages); there's a difference of over 13000 pages.
Obviously, as soon as the guest uses up enough of its memory, it
will get crashed by the PoD code.

In two runs I did, the difference (and hence the number of entries
reported in the eventual crash message) was identical, implying to
me that this is not a simple race, but rather a systematical problem.

Even on the initial dump taken (when the guest was sitting at the
boot manager screen), there already appears to be a difference of
800 pages (it's my understanding that at this point the difference
between entries and cache should equal the difference between
maxmem and mem).

Does this ring any bells? Any hints how to debug this? In any case
I'm attaching the full log in case you want to look at it.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.