[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] pv guests die after failed migration



Here is the full procedure:


Preparations:

root@xenturio1:/var/log/xen# dmsetup ls |grep thiswillfail
xen--data-thiswillfail--swap    (252, 236)
xen--data-thiswillfail--root    (252, 235)

root@xenturio2:/var/log/xen# dmsetup ls |grep thiswillfail

>Server 2 does not have the logical volumes activated.



root@xenturio1:/usr/src/linux-2.6-xen# xl create /mnt/vmctrl/xenconfig/thiswillfail.sxp
Parsing config file /mnt/vmctrl/xenconfig/thiswillfail.sxp
Daemon running with PID 6722

>it is in fact running with pid 6723:

root@xenturio1:/usr/src/linux-2.6-xen# ps auxww |grep "xl create"
root 6723 0.0 0.0 35616 972 ? Ssl 09:14 0:00 xl create /mnt/vmctrl/xenconfig/thiswillfail.sxp


>Lets check the logfiles
root@xenturio1:/var/log/xen# cat xen-hotplug.log
RTNETLINK answers: Operation not supported
RTNETLINK answers: Operation not supported

>stupid netlink again, no matter what stuff i load into the kernel that
>still pops up ... annoying ... anyway, its a non-issue in this case

root@xenturio1:/var/log/xen# cat xl-thiswillfail.log
Waiting for domain thiswillfail (domid 5) to die [pid 6723]

>Lets not make it wait any longer ;)

root@xenturio1:/usr/src/linux-2.6-xen# xl -vvv migrate thiswillfail xenturio2
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x0/0x0/380)
Loading new save file incoming migration stream (new xl fmt info 0x0/0x0/380)
 Savefile contains xl domain config
xc: detail: Had 0 unexplained entries in p2m table
xc: Saving memory: iter 0 (last sent 0 skipped 0): 133120/133120  100%
xc: detail: delta 9499ms, dom0 88%, target 2%, sent 451Mb/s, dirtied 1Mb/s 324 pages xc: Saving memory: iter 1 (last sent 130760 skipped 312): 133120/133120 100% xc: detail: delta 23ms, dom0 91%, target 0%, sent 455Mb/s, dirtied 48Mb/s 34 pages
xc: Saving memory: iter 2 (last sent 320 skipped 4): 133120/133120  100%
xc: detail: Start last iteration
libxl: debug: libxl_dom.c:384:libxl__domain_suspend_common_callback issuing PV suspend request via XenBus control node libxl: debug: libxl_dom.c:389:libxl__domain_suspend_common_callback wait for the guest to acknowledge suspend request libxl: debug: libxl_dom.c:434:libxl__domain_suspend_common_callback guest acknowledged suspend request libxl: debug: libxl_dom.c:438:libxl__domain_suspend_common_callback wait for the guest to suspend libxl: debug: libxl_dom.c:450:libxl__domain_suspend_common_callback guest has suspended
xc: detail: SUSPEND shinfo 0007fafc
xc: detail: delta 206ms, dom0 2%, target 0%, sent 4Mb/s, dirtied 24Mb/s 154 pages
xc: Saving memory: iter 3 (last sent 30 skipped 4): 133120/133120  100%
xc: detail: delta 3ms, dom0 0%, target 0%, sent 1682Mb/s, dirtied 1682Mb/s 154 pages
xc: detail: Total pages sent= 131264 (0.99x)
xc: detail: (of which 0 were fixups)
xc: detail: All memory is saved
xc: detail: Save exit rc=0
libxl: error: libxl.c:900:validate_virtual_disk failed to stat /dev/xen-data/thiswillfail-root: No such file or directory
cannot add disk 0 to domain: -6
migration target: Domain creation failed (code -3).
libxl: error: libxl_utils.c:408:libxl_read_exactly file/stream truncated reading ready message from migration receiver stream libxl: info: libxl_exec.c:72:libxl_report_child_exitstatus migration target process [6837] exited with error status 3
Migration failed, resuming at sender.




>Now see if it really is resumed at sender:

root@xenturio1:/usr/src/linux-2.6-xen# xl console thiswillfail
PM: freeze of devices complete after 0.207 msecs
PM: late freeze of devices complete after 0.058 msecs
------------[ cut here ]------------
kernel BUG at drivers/xen/events.c:1466!
invalid opcode: 0000 [#1] SMP
CPU 0
Modules linked in:

Pid: 6, comm: migration/0 Not tainted 3.0.4-xenU #6
RIP: e030:[<ffffffff8140d574>] [<ffffffff8140d574>] xen_irq_resume+0x224/0x370
RSP: e02b:ffff88001f9fbce0  EFLAGS: 00010082
RAX: ffffffffffffffef RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88001f809ea8 RSI: ffff88001f9fbd00 RDI: 0000000000000001
RBP: 0000000000000010 R08: ffffffff81859a00 R09: 0000000000000000
R10: 0000000000000000 R11: 09f911029d74e35b R12: 0000000000000000
R13: 000000000000f0a0 R14: 0000000000000000 R15: ffff88001f9fbd00
FS:  00007ff28f8c8700(0000) GS:ffff88001fec6000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fff02056048 CR3: 000000001e4d8000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process migration/0 (pid: 6, threadinfo ffff88001f9fa000, task ffff88001f9f7170)
Stack:
 ffff88001f9fbd34 ffff88001f9fbd54 0000000000000003 000000000000f100
 0000000000000000 0000000000000003 0000000000000000 0000000000000003
 ffff88001fa6ddb0 ffffffff8140aa20 ffffffff81859a08 0000000000000000
Call Trace:
 [<ffffffff8140aa20>] ? gnttab_map+0x100/0x130
 [<ffffffff815c2765>] ? _raw_spin_lock+0x5/0x10
 [<ffffffff81083e01>] ? cpu_stopper_thread+0x101/0x190
 [<ffffffff8140e1f5>] ? xen_suspend+0x75/0xa0
 [<ffffffff81083f1b>] ? stop_machine_cpu_stop+0x8b/0xd0
 [<ffffffff81083e90>] ? cpu_stopper_thread+0x190/0x190
 [<ffffffff81083dd0>] ? cpu_stopper_thread+0xd0/0x190
 [<ffffffff815c0870>] ? schedule+0x270/0x6c0
 [<ffffffff81083d00>] ? copy_pid_ns+0x2a0/0x2a0
 [<ffffffff81065846>] ? kthread+0x96/0xa0
 [<ffffffff815c4024>] ? kernel_thread_helper+0x4/0x10
 [<ffffffff815c3436>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff815c2be1>] ? retint_restore_args+0x5/0x6
 [<ffffffff815c4020>] ? gs_change+0x13/0x13
Code: e8 f2 e9 ff ff 8b 44 24 10 44 89 e6 89 c7 e8 64 e8 ff ff ff c3 83 fb 04 0f 84 95 fe ff ff 4a 8b 14 f5 20 95 85 81 e9 68 ff ff ff <0f> 0b eb fe 0f 0b eb fe 48 8b 1d fd 00 42 00 4c 8d 6c 24 20 eb
RIP  [<ffffffff8140d574>] xen_irq_resume+0x224/0x370
 RSP <ffff88001f9fbce0>
---[ end trace 82e2e97d58b5f835 ]---


> And here are the new versions of /var/log/xen

root@xenturio1:/var/log/xen# cat xl-thiswillfail.log
Waiting for domain thiswillfail (domid 5) to die [pid 6723]
Domain 5 is dead
Done. Exiting now

>target servers /var/log/xen remains empty



And that, was 3.0.4-xenU, same goes for 2.6.39-xenU.

> Please can you provide full logs from /var/log/xen on both ends. Running
> "xl -vvv migrate" will also produce more stuff on stdout, some of which
> may be useful.
>
> Also please capture the complete guest log in case it is an issue there.

I am not quite sure what you mean by "guest log".


When you reply to this i should be much quicker to respond, had a hell of a week and didnt really get to check my list-mail until yesterday evening.

I guess anyone with 2 machines running xen should easily be able to reproduce this problem.


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.