[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Linux 4.1 reports wrong number of pages to toolstack



On Fri, Sep 04, 2015 at 02:28:41AM -0600, Jan Beulich wrote:
> >>> On 04.09.15 at 05:38, <JGross@xxxxxxxx> wrote:
> > On 09/04/2015 02:40 AM, Wei Liu wrote:
> >> This issue is exposed by the introduction of migration v2. The symptom is 
> >> that
> >> a guest with 32 bit 4.1 kernel can't be restored because it's asking for 
> >> too
> >> many pages.
> >>
> >> Note that all guests have 512MB memory, which means they have 131072 pages.
> >>
> >> Both 3.14 tests [2] [3] get the correct number of pages.  Like:
> >>
> >>     xc: detail: max_pfn 0x1ffff, p2m_frames 256
> >>     ...
> >>     xc: detail: Memory: 2048/131072    1%
> >>     ...
> >>
> >> However in both 4.1 [0] [1] the number of pages are quite wrong.
> >>
> >> 4.1 32 bit:
> >>
> >>     xc: detail: max_pfn 0xfffff, p2m_frames 1024
> >>     ...
> >>     xc: detail: Memory: 11264/1048576    1%
> >>     ...
> >>
> >> It thinks it has 4096MB memory.
> >>
> >> 4.1 64 bit:
> >>
> >>     xc: detail: max_pfn 0x3ffff, p2m_frames 512
> >>     ...
> >>     xc: detail: Memory: 3072/262144    1%
> >>     ...
> >>
> >> It thinks it has 1024MB memory.
> >>
> >> The total number of pages is determined in libxc by calling
> >> xc_domain_nr_gpfns, which yanks shared_info->arch.max_pfn from
> >> hypervisor. And that value is clearly touched by Linux in some way.
> > 
> > Sure. shared_info->arch.max_pfn holds the number of pfns the p2m list
> > can handle. This is not the memory size of the domain.
> > 
> >> I now think this is a bug in Linux kernel. The biggest suspect is the
> >> introduction of linear P2M.  If you think this is a bug in toolstack,
> >> please let me know.
> > 
> > I absolutely think it is a toolstack bug. Even without the linear p2m
> > things would go wrong in case a ballooned down guest would be migrated,
> > as shared_info->arch.max_pfn would hold the upper limit of the guest
> > in this case and not the current size.
> 
> I don't think this necessarily is a tool stack bug, at least not in
> the sense implied above - since (afaik) migrating ballooned guests
> (at least PV ones) has been working before, there ought to be
> logic to skip ballooned pages (and I certainly recall having seen

Yes, there is.

Migration v2 has logic to skip gpfn when the underlying mfn is
INVALID_MFN. I'm not too convinced the code that implement that logic is
working correctly. I need to have a closer look today.

> migration slowly move up to e.g. 50% and the skip the other
> half due to being ballooned, albeit that recollection certainly is
> from before v2). And pages above the highest populated one
> ought to be considered ballooned just as much. With the
> information provided by Wei I don't think we can judge about
> this, since it only shows the values the migration process starts
> from, not when, why, or how it fails.
> 

It fails on the receiving end when helper tries to populate more pages
than the guest can have. In the specific case above, helper populates
nr 131073 page and fails.

Wei.

> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.