[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Linux 4.1 reports wrong number of pages to toolstack



On 04/09/15 09:28, Jan Beulich wrote:
On 04.09.15 at 05:38, <JGross@xxxxxxxx> wrote:
On 09/04/2015 02:40 AM, Wei Liu wrote:
This issue is exposed by the introduction of migration v2. The symptom is that
a guest with 32 bit 4.1 kernel can't be restored because it's asking for too
many pages.

Note that all guests have 512MB memory, which means they have 131072 pages.

Both 3.14 tests [2] [3] get the correct number of pages.  Like:

     xc: detail: max_pfn 0x1ffff, p2m_frames 256
     ...
     xc: detail: Memory: 2048/131072    1%
     ...

However in both 4.1 [0] [1] the number of pages are quite wrong.

4.1 32 bit:

     xc: detail: max_pfn 0xfffff, p2m_frames 1024
     ...
     xc: detail: Memory: 11264/1048576    1%
     ...

It thinks it has 4096MB memory.

4.1 64 bit:

     xc: detail: max_pfn 0x3ffff, p2m_frames 512
     ...
     xc: detail: Memory: 3072/262144    1%
     ...

It thinks it has 1024MB memory.

The total number of pages is determined in libxc by calling
xc_domain_nr_gpfns, which yanks shared_info->arch.max_pfn from
hypervisor. And that value is clearly touched by Linux in some way.
Sure. shared_info->arch.max_pfn holds the number of pfns the p2m list
can handle. This is not the memory size of the domain.

I now think this is a bug in Linux kernel. The biggest suspect is the
introduction of linear P2M.  If you think this is a bug in toolstack,
please let me know.
I absolutely think it is a toolstack bug. Even without the linear p2m
things would go wrong in case a ballooned down guest would be migrated,
as shared_info->arch.max_pfn would hold the upper limit of the guest
in this case and not the current size.
I don't think this necessarily is a tool stack bug, at least not in
the sense implied above - since (afaik) migrating ballooned guests
(at least PV ones) has been working before, there ought to be
logic to skip ballooned pages (and I certainly recall having seen
migration slowly move up to e.g. 50% and the skip the other
half due to being ballooned, albeit that recollection certainly is
from before v2). And pages above the highest populated one
ought to be considered ballooned just as much. With the
information provided by Wei I don't think we can judge about
this, since it only shows the values the migration process starts
from, not when, why, or how it fails.

Max pfn reported by migration v2 is max pfn, not the number of pages of RAM in the guest.

It is used for the size of the bitmaps used by migration v2, including the logdirty op calls.

All frames between 0 and max pfn will have their type queried, and acted upon appropriately, including doing nothing if the frame was ballooned out.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.