[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] PoD code killing domain before it really gets started



George,

in the hope that you might have some insight, or might be
remembering that something like this was reported before (and
ideally fixed), I'll try to describe a problem a customer of ours
reported. Unfortunately this is with Xen 4.0.x (plus numerous
backports), and it is not known whether the same issue exists
on 4.1.x or -unstable.

For a domain with maxmem=16000M and memory=3200M, what
gets logged is

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! tot_pages 480 
pod_entries 221184
(XEN) domain_crash called from p2m.c:1150
(XEN) Domain 3 reported crashed by domain 0 on cpu#6:
(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! tot_pages 480 
pod_entries 221184
(XEN) domain_crash called from p2m.c:1150

Translated to hex, the numbers are 1e0 and 36000. The latter
one varies across the (rather infrequent) cases where this
happens (but was always a multiple of 1000 - see below), and
instant retries to create the affected domain did always succeed
so far (i.e. the failure is definitely not because of a lack of free
memory).

Given that the memory= target wasn't reached, yet, I would
conclude that this happens in the middle of (4.0.x file name used
here) tools/libxc/xc_hvm_build.c:setup_guest()'s main physmap
population code. However, the way I read the code there, I
would think that the sequence of population should be (using
hex GFNs) 0...9f, c0...7ff, 800-fff, 1000-17ff, etc. That,
however appears to be inconsistent with the logged numbers
above - tot_pages should always be at least 7e0 (low 2Mb less
the VGA hole), especially when pod_entries is divisible by 800
(the increment by which large page population happens).

As a result of this apparent inconsistency I can't really
conclude anything from the logged numbers.

The main question, irrespective of any numbers, of course is:
How would p2m_pod_demand_populate() be invoked at all
during this early phase of domain construction? Nothing
should be touching any of the memory... If this nevertheless
is possible (even if just for a single page), then perhaps the
tools ought to make sure the pages put into the low 2Mb get
actually zeroed, so the PoD code has a chance to find victim
pages.

Thanks for any thoughts or pointers, Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.