[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Problems using xl migrate





On Mon, 24 Nov 2014, Wei Liu wrote:

On Mon, Nov 24, 2014 at 01:13:25PM +0000, Andrew Cooper wrote:
On 24/11/14 11:50, George Dunlap wrote:
On Mon, Nov 24, 2014 at 12:07 AM, M A Young <m.a.young@xxxxxxxxxxxx> wrote:
On Sat, 22 Nov 2014, M A Young wrote:

While investigating a bug reported on Red Hat Bugzilla
https://bugzilla.redhat.com/show_bug.cgi?id=1166461
I discovered the following

xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the
bug report is for Xen 4.3 hvm ) when xl migrate domid localhost works. There
are actually two issues here

* the segfault in libxl-save-helper --restore-domain (as reported in the
bug above) occurs if the guest memory is 1024M (on my 4G box) and is
presumably because the allocated memory eventually runs out

I have found a bit more out about this. The segfault at at line 1378 of
tools/libxc/xc_domain_restore.c which is
                DPRINTF("************** pfn=%lx type=%lx gotcs=%08lx "
                        "actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn],
                        csum_page(region_base + (i + curbatch)*PAGE_SIZE),
                        csum_page(buf));
and is because pfn in pagebuf->pfn_types[pfn] is beyond the end of the
array. This occurs in the verification phase.

* the segfault doesn't occur if the guest memory is 128M, but the
migration still fails. The first attached file contains the log from a run
with xl -v migrate --debug domid localhost (with mfn and duplicated lines
stripped out to make the size manageable).

The difference actually seems to be down to how active the VM is rather than
the memory size (my small memory test system was doing very little, my
larger system was a full OS install). In the non-segfault case the problem
was the printf and printf_info commands in the create_domain() routine in
tools/libxl/xl_cmdimpl.c . As xl migrate uses stdout to pass status messages
back from the restoring dom0, these commands cause an unexpected message. If
you move them onto stderr then the migration completes in the non-segfault
case.
Good job tracking those down -- are there patches in the works?

The segfault for "--debug" has already been identified and a patch
posted by Wen Congyang

The call to csum_page() incorrectly calculates the offset it is supposed
to checksum, and wanders beyond the mapping of guest space.

Patch in 1409908261-18682-3-git-send-email-wency@xxxxxxxxxxxxxx


And the said patch has been applied (3460eeb3fc2) so we're fine.

However that doesn't fix my crash. I tried with it applied and still saw the crash. I also tried 4.5-rc1 (without XSM to avoid my other issue) and that crashed as well.

        Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.