[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 00/21] libxl: domain save/restore: run in a separate process



Shriram Rajagopalan writes ("Re: [PATCH v5 00/21] libxl: domain save/restore: 
run in a separate process"):
> Ian,
>  The code segfaults. Here are the system details and error traces from gdb.

Thanks.

> My setup:
> 
> dom0 : ubuntu 64bit, 2.6.32-39 (pvops kernel),
>            running latest xen-4.2-unstable (built from your repo)
>            tools stack also built from your repo (which I hope has all the 
> latest patches).
> 
> domU: ubuntu 32bit PV, xenolinux kernel (2.6.32.2 - novel suse version)
>            with suspend event channel support
> 
> As a sanity check, I tested xl remus with latest tip from xen-unstable
> mercurial repo, c/s: 25496:e08cf97e76f0
> 
> Blackhole replication (to /dev/null) and localhost replication worked as 
> expected
> and the guest recovered properly without any issues.

Thanks for the test runes.  That didn't work entirely properly for
me, even with the xen-unstable baseline.

I did this
   xl -vvvv remus -b -i 100 debian.guest.osstest dummy >remus.log 2>&1 &
The result was that the guest's networking broke.  The guest shows up
in xl list as
   debian.guest.osstest                      7   512     1     ---ss-       5.2
and is still responsive on its pv console.  After I killed the remus
process, the guest's networking was still broken.

At the start, the guest prints this on its console:
  [   36.017241] WARNING: g.e. still in use!
  [   36.021056] WARNING: g.e. still in use!
  [   36.024740] WARNING: g.e. still in use!
  [   36.024763] WARNING: g.e. still in use!

If I try the rune with "localhost" I would have expected, surely, to
see a domain with the incoming migration ?  But I don't.  I tried
killing the `xl remus' process and the guest became wedged.


However, when I apply my series, I can indeed produce an assertion
failure:

 xc: detail: All memory is saved
 xc: error: Could not get domain info (3 = No such process): Internal error
 libxl: error: libxl.c:388:libxl_domain_resume: xc_domain_resume failed for 
domain 3077579968: No such process
 xl: libxl_event.c:1426: libxl__ao_inprogress_gc: Assertion `ao->magic == 
0xA0FACE00ul' failed.

So I have indeed made matters worse.


> Blackhole replication:
> ================
> xl error:
> ----------
> xc: error: Could not get domain info (3 = No such process): Internal error
> libxl: error: libxl.c:388:libxl_domain_resume: xc_domain_resume failed for 
> domain 4154075147<tel:4154075147>: No such process
> libxl: error: libxl_dom.c:1184:libxl__domain_save_device_model: unable to 
> open qemu save file ?8b: No such file or directory

I don't see that at all.

NB that PV guests may have a qemu for certain disk backends, or
consoles, depending on the configuration.  Can you show me your domain
config ?  Mine is below.

> I also ran xl in GDB to get a stack trace and hopefully some useful debug 
> info.
> gdb traces: http://pastebin.com/7zFwFjW4

I get a different crash - see above.

> Localhost replication: Partial success, but xl still segfaults
>  dmesg shows
>  [ 1399.254849] xl[4716]: segfault at 0 ip 00007f979483a417 sp 
> 00007fffe06043e0 error 6 in libxenlight.so.2.0.0[7f9794807000+4d000]

I see exactly the same thing with `localhost' instead of `dummy'.  And
I see no incoming domain.

I will investigate the crash I see.  In the meantime can you try to
help me see why it doesn't work me even with the baseline ?

Thanks,
Ian.

#
# Configuration file for the Xen instance debian.guest.osstest, created
# by xen-tools 4.2 on Thu Apr  5 16:43:43 2012.
#

#
#  Kernel + memory size
#
#kernel      = '/boot/vmlinuz-2.6.32.57'
#ramdisk     = '/boot/initrd.img-2.6.32.57'

#bootloader = 'pygrub'
bootloader = '/root/strace-pygrub'


memory      = '512'

#
#  Disk device(s).
#
root        = '/dev/xvda2 ro'
disk        = [
                  'phy:/dev/bedbug/debian.guest.osstest-disk,xvda2,w',
                  'phy:/dev/bedbug/debian.guest.osstest-swap,xvda1,w',
              ]


#
#  Physical volumes
#


#
#  Hostname
#
name        = 'debian.guest.osstest'

#
#  Networking
#
#dhcp        = 'dhcp'
vif         = [ 'mac=5a:36:0e:26:00:01' ]

#
#  Behaviour
#
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash='preserve'




vcpus = 1

extra='console=hvc0 earlyprintk=xen'

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.