[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen-4.7 regression when saving a pv guest



On 26/08/16 12:52, Stefan Bader wrote:
> On 25.08.2016 19:31, Juergen Gross wrote:
>> On 25/08/16 17:48, Stefan Bader wrote:
>>> When I try to save a PV guest with 4G of memory using xen-4.7 I get the
>>> following error:
>>>
>>> II: Guest memory 4096 MB
>>> II: Saving guest state to file...
>>> Saving to /tmp/pvguest.save new xl format (info 0x3/0x0/1131)
>>> xc: info: Saving domain 23, type x86 PV
>>> xc: error: Bad mfn in p2m_frame_list[0]: Internal error
>>
>> So the first mfn of the memory containing the p2m information is bogus.
>> Weird.
> 
> Hm, not sure how bogus. From below the first mfn is 0x4eb1c8 and points to
> pfn=0xff7c8 which is above the current max of 0xbffff. But then the dmesg 
> inside
> the guest said: "last_pfn = 0x100000" which would be larger than the pfn 
> causing
> the error.
> 
>>
>>> xc: error: mfn 0x4eb1c8, max 0x820000: Internal error
>>> xc: error:   m2p[0x4eb1c8] = 0xff7c8, max_pfn 0xbffff: Internal error
>>> xc: error: Save failed (34 = Numerical result out of range): Internal error
>>> libxl: error: libxl_stream_write.c:355:libxl__xc_domain_save_done: saving
>>> domain: domain did not respond to suspend request: Numerical result out of 
>>> range
>>> Failed to save domain, resuming domain
>>> xc: error: Dom 23 not suspended: (shutdown 0, reason 255): Internal error
>>> libxl: error: libxl_dom_suspend.c:460:libxl__domain_resume: xc_domain_resume
>>> failed for domain 23: Invalid argument
>>> EE: Guest not off after save!
>>> FAIL
>>>
>>> From dmesg inside the guest:
>>> [    0.000000] e820: last_pfn = 0x100000 max_arch_pfn = 0x400000000
>>>
>>> Somehow I am slightly suspicious about
>>>
>>> commit 91e204d37f44913913776d0a89279721694f8b32
>>>   libxc: try to find last used pfn when migrating
>>>
>>> since that seems to potentially lower ctx->x86_pv.max_pfn which is checked
>>> against in mfn_in_pseudophysmap(). Is that a known problem?
>>> With xen-4.6 and the same dom0/guest kernel version combination this does 
>>> work.
>>
>> Can you please share some more information? Especially:
>>
>> - guest kernel version?
> Hm, apparently 4.4 and 4.6 with stable updates. I just tried a much older 
> guest
> kernel (3.2) environment and that works. So it is the combination of switching
> from xen-4.6 to 4.7 and guest kernels running 4.4 and later. And while the 
> exact
> mfn/pfn which gets dumped varies a little, the offending mapping always points
> to 0xffxxx which would be below last_pfn.

Aah, okay. The problem seems to be specific to the linear p2m list
handling.

Trying on my system... Yep, seeing your problem, too.

Weird that nobody else stumbled over it.
Ian, don't we have any test in OSSTEST which should catch this problem?
A 4GB 64-bit pv-domain with Linux kernel 4.3 or newer can't be saved
currently.

Following upstream patch fixes it for me:

diff --git a/tools/libxc/xc_sr_save_x86_pv.c
b/tools/libxc/xc_sr_save_x86_pv.c
index 4a29460..7043409 100644
--- a/tools/libxc/xc_sr_save_x86_pv.c
+++ b/tools/libxc/xc_sr_save_x86_pv.c
@@ -430,6 +430,8 @@ static int map_p2m_list(struct xc_sr_context *ctx,
uint64_t p2m_cr3)

         if ( level == 2 )
         {
+            if ( saved_idx == idx_end )
+                saved_idx++;
             max_pfn = ((xen_pfn_t)saved_idx << 9) * fpp - 1;
             if ( max_pfn < ctx->x86_pv.max_pfn )
             {


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.