[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen-4.7 regression when saving a pv guest



On 25.08.2016 19:31, Juergen Gross wrote:
> On 25/08/16 17:48, Stefan Bader wrote:
>> When I try to save a PV guest with 4G of memory using xen-4.7 I get the
>> following error:
>>
>> II: Guest memory 4096 MB
>> II: Saving guest state to file...
>> Saving to /tmp/pvguest.save new xl format (info 0x3/0x0/1131)
>> xc: info: Saving domain 23, type x86 PV
>> xc: error: Bad mfn in p2m_frame_list[0]: Internal error
> 
> So the first mfn of the memory containing the p2m information is bogus.
> Weird.

Hm, not sure how bogus. From below the first mfn is 0x4eb1c8 and points to
pfn=0xff7c8 which is above the current max of 0xbffff. But then the dmesg inside
the guest said: "last_pfn = 0x100000" which would be larger than the pfn causing
the error.

> 
>> xc: error: mfn 0x4eb1c8, max 0x820000: Internal error
>> xc: error:   m2p[0x4eb1c8] = 0xff7c8, max_pfn 0xbffff: Internal error
>> xc: error: Save failed (34 = Numerical result out of range): Internal error
>> libxl: error: libxl_stream_write.c:355:libxl__xc_domain_save_done: saving
>> domain: domain did not respond to suspend request: Numerical result out of 
>> range
>> Failed to save domain, resuming domain
>> xc: error: Dom 23 not suspended: (shutdown 0, reason 255): Internal error
>> libxl: error: libxl_dom_suspend.c:460:libxl__domain_resume: xc_domain_resume
>> failed for domain 23: Invalid argument
>> EE: Guest not off after save!
>> FAIL
>>
>> From dmesg inside the guest:
>> [    0.000000] e820: last_pfn = 0x100000 max_arch_pfn = 0x400000000
>>
>> Somehow I am slightly suspicious about
>>
>> commit 91e204d37f44913913776d0a89279721694f8b32
>>   libxc: try to find last used pfn when migrating
>>
>> since that seems to potentially lower ctx->x86_pv.max_pfn which is checked
>> against in mfn_in_pseudophysmap(). Is that a known problem?
>> With xen-4.6 and the same dom0/guest kernel version combination this does 
>> work.
> 
> Can you please share some more information? Especially:
> 
> - guest kernel version?
Hm, apparently 4.4 and 4.6 with stable updates. I just tried a much older guest
kernel (3.2) environment and that works. So it is the combination of switching
from xen-4.6 to 4.7 and guest kernels running 4.4 and later. And while the exact
mfn/pfn which gets dumped varies a little, the offending mapping always points
to 0xffxxx which would be below last_pfn.

Xen version             4.6             4.7
Guest Kernel
3.13.x                  ok              ok
4.2.x                   ok              ok
4.4.15                  ok              fail
4.6.7                   ok              fail

I will try 4.7 and 4.8 based guest kernels with xen-4.7 in a bit, too.

> - any patches in kernel not being upstream, especially in Xen-specific
None I know of.

>   boot path?
With affected kernels both direct kernel load and pvgrub.

> - dmesg from guest with E820 map?

From 4.4.x kernel:
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000000] e820: last_pfn = 0x100000 max_arch_pfn = 0x400000000
[    0.000000] e820: cannot find a gap in the 32bit address range
               e820: PCI devices with unassigned 32bit BARs may break!
[    0.000000] e820: [mem 0x100100000-0x1004fffff] available for PCI devices

Old 3.13 kernel (I see nothing different here):
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000000] e820: last_pfn = 0x100000 max_arch_pfn = 0x400000000
[    0.000000] e820: cannot find a gap in the 32bit address range
[    0.000000] e820: PCI devices with unassigned 32bit BARs may break!
[    0.000000] e820: [mem 0x100100000-0x1004fffff] available for PCI devices

> - guest configuration?

Rather simple (some of it ls /for historic reasons, I also tried externally
supplied kernel and initrd):

name     = "testpv"
kernel   = "/root/boot/pv-grub-hd0--x86_64.gz"
memory   = 4096
vcpus    = 4
disk     = [
                'file:/root/img/testpv.img,xvda1,w'
]
vif      = [ 'mac=xx:xx:xx:xx:xx:xx, bridge=br0' ]
on_crash = "coredump-destroy"

> 
> The same error would occur when trying to live migrate the guest. And
> this has been tested a lot since above commit, so I suspect something
> is very special in your case.
> 
> 
> Juergen
> 


Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.