[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Linux 4.1 reports wrong number of pages to toolstack



>>> On 04.09.15 at 05:38, <JGross@xxxxxxxx> wrote:
> On 09/04/2015 02:40 AM, Wei Liu wrote:
>> This issue is exposed by the introduction of migration v2. The symptom is 
>> that
>> a guest with 32 bit 4.1 kernel can't be restored because it's asking for too
>> many pages.
>>
>> Note that all guests have 512MB memory, which means they have 131072 pages.
>>
>> Both 3.14 tests [2] [3] get the correct number of pages.  Like:
>>
>>     xc: detail: max_pfn 0x1ffff, p2m_frames 256
>>     ...
>>     xc: detail: Memory: 2048/131072    1%
>>     ...
>>
>> However in both 4.1 [0] [1] the number of pages are quite wrong.
>>
>> 4.1 32 bit:
>>
>>     xc: detail: max_pfn 0xfffff, p2m_frames 1024
>>     ...
>>     xc: detail: Memory: 11264/1048576    1%
>>     ...
>>
>> It thinks it has 4096MB memory.
>>
>> 4.1 64 bit:
>>
>>     xc: detail: max_pfn 0x3ffff, p2m_frames 512
>>     ...
>>     xc: detail: Memory: 3072/262144    1%
>>     ...
>>
>> It thinks it has 1024MB memory.
>>
>> The total number of pages is determined in libxc by calling
>> xc_domain_nr_gpfns, which yanks shared_info->arch.max_pfn from
>> hypervisor. And that value is clearly touched by Linux in some way.
> 
> Sure. shared_info->arch.max_pfn holds the number of pfns the p2m list
> can handle. This is not the memory size of the domain.
> 
>> I now think this is a bug in Linux kernel. The biggest suspect is the
>> introduction of linear P2M.  If you think this is a bug in toolstack,
>> please let me know.
> 
> I absolutely think it is a toolstack bug. Even without the linear p2m
> things would go wrong in case a ballooned down guest would be migrated,
> as shared_info->arch.max_pfn would hold the upper limit of the guest
> in this case and not the current size.

I don't think this necessarily is a tool stack bug, at least not in
the sense implied above - since (afaik) migrating ballooned guests
(at least PV ones) has been working before, there ought to be
logic to skip ballooned pages (and I certainly recall having seen
migration slowly move up to e.g. 50% and the skip the other
half due to being ballooned, albeit that recollection certainly is
from before v2). And pages above the highest populated one
ought to be considered ballooned just as much. With the
information provided by Wei I don't think we can judge about
this, since it only shows the values the migration process starts
from, not when, why, or how it fails.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.