Xen project Mailing List

Re: [Xen-devel] Linux 4.1 reports wrong number of pages to toolstack

Date: Fri, 4 Sep 2015 12:40:14 +0100

Cc: Juergen Gross <JGross@xxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Fri, 04 Sep 2015 11:40:21 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Fri, Sep 04, 2015 at 02:28:41AM -0600, Jan Beulich wrote: > >>> On 04.09.15 at 05:38, <JGross@xxxxxxxx> wrote: > > On 09/04/2015 02:40 AM, Wei Liu wrote: > >> This issue is exposed by the introduction of migration v2. The symptom is > >> that > >> a guest with 32 bit 4.1 kernel can't be restored because it's asking for > >> too > >> many pages. > >> > >> Note that all guests have 512MB memory, which means they have 131072 pages. > >> > >> Both 3.14 tests [2] [3] get the correct number of pages. Like: > >> > >> xc: detail: max_pfn 0x1ffff, p2m_frames 256 > >> ... > >> xc: detail: Memory: 2048/131072 1% > >> ... > >> > >> However in both 4.1 [0] [1] the number of pages are quite wrong. > >> > >> 4.1 32 bit: > >> > >> xc: detail: max_pfn 0xfffff, p2m_frames 1024 > >> ... > >> xc: detail: Memory: 11264/1048576 1% > >> ... > >> > >> It thinks it has 4096MB memory. > >> > >> 4.1 64 bit: > >> > >> xc: detail: max_pfn 0x3ffff, p2m_frames 512 > >> ... > >> xc: detail: Memory: 3072/262144 1% > >> ... > >> > >> It thinks it has 1024MB memory. > >> > >> The total number of pages is determined in libxc by calling > >> xc_domain_nr_gpfns, which yanks shared_info->arch.max_pfn from > >> hypervisor. And that value is clearly touched by Linux in some way. > > > > Sure. shared_info->arch.max_pfn holds the number of pfns the p2m list > > can handle. This is not the memory size of the domain. > > > >> I now think this is a bug in Linux kernel. The biggest suspect is the > >> introduction of linear P2M. If you think this is a bug in toolstack, > >> please let me know. > > > > I absolutely think it is a toolstack bug. Even without the linear p2m > > things would go wrong in case a ballooned down guest would be migrated, > > as shared_info->arch.max_pfn would hold the upper limit of the guest > > in this case and not the current size. > > I don't think this necessarily is a tool stack bug, at least not in > the sense implied above - since (afaik) migrating ballooned guests > (at least PV ones) has been working before, there ought to be > logic to skip ballooned pages (and I certainly recall having seen Yes, there is. Migration v2 has logic to skip gpfn when the underlying mfn is INVALID_MFN. I'm not too convinced the code that implement that logic is working correctly. I need to have a closer look today. > migration slowly move up to e.g. 50% and the skip the other > half due to being ballooned, albeit that recollection certainly is > from before v2). And pages above the highest populated one > ought to be considered ballooned just as much. With the > information provided by Wei I don't think we can judge about > this, since it only shows the values the migration process starts > from, not when, why, or how it fails. > It fails on the receiving end when helper tries to populate more pages than the guest can have. In the specific case above, helper populates nr 131073 page and fails. Wei. > Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.