[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Sender's domU crashes after xl remus on Xen v4.3.2 - kernel v3.12.21



Hi,

This is my first attempt to make Remus work for my PV domUs and I having a problem when domU always crashes on a sender's side, but works fine on a receiver's dom0. Migration with "xl migrate" command works fine there.

Here is an example of error which happens when I'm trying to remus domU "transcendens" from dom0 "xenium2" to dom0 "xenium1":

xenium2 ~ # uname -rmp
3.12.21-gentoo-r1-dom0 x86_64 Intel(R) Xeon(R) CPU E3-1270 V2 @ 3.50GHz
xenium2 ~ # xl -vvvvv remus transcendens xenium1
Password:
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x0/0x0/217)
libxl: debug: libxl.c:708:libxl_domain_remus_start: ao 0x6e9d30: create: how=(nil) callback=(nil) poller=0x6e9d90 libxl: debug: libxl_dom.c:1208:libxl__toolstack_save: domain=39 toolstack data size=8 libxl: debug: libxl.c:735:libxl_domain_remus_start: ao 0x6e9d30: inprogress: poller=0x6e9d90, flags=i Loading new save file <incoming migration stream> (new xl fmt info 0x0/0x0/217)
 Savefile contains xl domain config
libxl-save-helper: debug: starting save: Success
xc: detail: xc_domain_save: starting save of domid 39
xc: detail: Had 0 unexplained entries in p2m table
xc: Saving memory: iter 0 (last sent 0 skipped 0): 4096/65536 6%xc: progress: Reloading memory pages: 4096/65536 6% xc: Saving memory: iter 0 (last sent 0 skipped 0): 8212/65536 12%xc: progress: Reloading memory pages: 8192/65536 12% xc: Saving memory: iter 0 (last sent 0 skipped 0): 11284/65536 17%xc: progress: Reloading memory pages: 11264/65536 17% xc: Saving memory: iter 0 (last sent 0 skipped 0): 15380/65536 23%xc: progress: Reloading memory pages: 15360/65536 23% xc: Saving memory: iter 0 (last sent 0 skipped 0): 18452/65536 28%xc: progress: Reloading memory pages: 18432/65536 28% xc: Saving memory: iter 0 (last sent 0 skipped 0): 22548/65536 34%xc: progress: Reloading memory pages: 22528/65536 34% xc: Saving memory: iter 0 (last sent 0 skipped 0): 25620/65536 39%xc: progress: Reloading memory pages: 25600/65536 39% xc: Saving memory: iter 0 (last sent 0 skipped 0): 29716/65536 45%xc: progress: Reloading memory pages: 29696/65536 45% xc: Saving memory: iter 0 (last sent 0 skipped 0): 32788/65536 50%xc: progress: Reloading memory pages: 32768/65536 50% xc: Saving memory: iter 0 (last sent 0 skipped 0): 36884/65536 56%xc: progress: Reloading memory pages: 36864/65536 56% xc: Saving memory: iter 0 (last sent 0 skipped 0): 40980/65536 62%xc: progress: Reloading memory pages: 40960/65536 62% xc: Saving memory: iter 0 (last sent 0 skipped 0): 44052/65536 67%xc: progress: Reloading memory pages: 44032/65536 67% xc: Saving memory: iter 0 (last sent 0 skipped 0): 48148/65536 73%xc: progress: Reloading memory pages: 48128/65536 73% xc: Saving memory: iter 0 (last sent 0 skipped 0): 51220/65536 78%xc: progress: Reloading memory pages: 51200/65536 78% xc: Saving memory: iter 0 (last sent 0 skipped 0): 55316/65536 84%xc: progress: Reloading memory pages: 55296/65536 84% xc: Saving memory: iter 0 (last sent 0 skipped 0): 58393/65536 89%xc: progress: Reloading memory pages: 58368/65536 89% xc: Saving memory: iter 0 (last sent 0 skipped 0): 62517/65536 95%xc: progress: Reloading memory pages: 62464/65536 95%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 65536/65536  100%
xc: detail: delta 130790ms, dom0 7%, target 0%, sent 16Mb/s, dirtied 0Mb/s 82 pages xc: Saving memory: iter 1 (last sent 65463 skipped 73): 65536/65536 100% xc: detail: delta 80ms, dom0 15%, target 0%, sent 33Mb/s, dirtied 0Mb/s 0 pages
xc: Saving memory: iter 2 (last sent 82 skipped 0): 65536/65536  100%
xc: detail: Start last iteration
libxl: debug: libxl_dom.c:1038:libxl__domain_suspend_common_callback: issuing PV suspend request via XenBus control node libxl: debug: libxl_dom.c:1042:libxl__domain_suspend_common_callback: wait for the guest to acknowledge suspend request libxl: debug: libxl_dom.c:1089:libxl__domain_suspend_common_callback: guest acknowledged suspend request libxl: debug: libxl_dom.c:1093:libxl__domain_suspend_common_callback: wait for the guest to suspend
xc: progress: Reloading memory pages: 65545/65536  100%
libxl: debug: libxl_dom.c:1107:libxl__domain_suspend_common_callback: guest has suspended
xc: detail: SUSPEND shinfo 000ce493
xc: detail: delta 202ms, dom0 12%, target 0%, sent 0Mb/s, dirtied 19Mb/s 120 pages
xc: Saving memory: iter 3 (last sent 0 skipped 0): 65536/65536  100%
xc: detail: delta 1ms, dom0 0%, target 0%, sent 3932Mb/s, dirtied 3932Mb/s 120 pages
xc: detail: Total pages sent= 65665 (1.00x)
xc: detail: (of which 0 were fixups)
xc: detail: All memory is saved
libxl: debug: libxl_dom.c:1038:libxl__domain_suspend_common_callback: issuing PV suspend request via XenBus control node libxl: debug: libxl_dom.c:1042:libxl__domain_suspend_common_callback: wait for the guest to acknowledge suspend request libxl: debug: libxl_dom.c:1089:libxl__domain_suspend_common_callback: guest acknowledged suspend request libxl: debug: libxl_dom.c:1093:libxl__domain_suspend_common_callback: wait for the guest to suspend libxl: debug: libxl_dom.c:1107:libxl__domain_suspend_common_callback: guest has suspended
xc: detail: SUSPEND shinfo 000ce493
xc: detail: delta 201ms, dom0 6%, target 0%, sent 0Mb/s, dirtied 19Mb/s 120 pages
xc: Saving memory: iter 4 (last sent 120 skipped 0): 65536/65536  100%
xc: detail: delta 4ms, dom0 100%, target 0%, sent 3727Mb/s, dirtied 3727Mb/s 455 pages
xc: detail: Total pages sent= 66120 (1.01x)
xc: detail: (of which 0 were fixups)
xc: detail: All memory is saved
migration target: Remus Failover for domain 57
libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream truncated reading ipc msg header from domain 39 save/restore helper stdout pipe libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus: domain 39 save/restore helper [1902] died due to fatal signal Broken pipe libxl: debug: libxl_event.c:1569:libxl__ao_complete: ao 0x6e9d30: complete, rc=-3 libxl: debug: libxl_event.c:1541:libxl__ao__destroy: ao 0x6e9d30: destroy
remus sender: libxl_domain_suspend failed (rc=-3)
Remus: Backup failed? resuming domain at primary.
libxl: debug: libxl.c:439:libxl_domain_resume: ao 0x6e9d30: create: how=(nil) callback=(nil) poller=0x6e9d90 libxl: debug: libxl_event.c:1569:libxl__ao_complete: ao 0x6e9d30: complete, rc=0 libxl: debug: libxl.c:442:libxl_domain_resume: ao 0x6e9d30: inprogress: poller=0x6e9d90, flags=ic libxl: debug: libxl_event.c:1541:libxl__ao__destroy: ao 0x6e9d30: destroy
xc: debug: hypercall buffer: total allocations:43 total releases:43
xc: debug: hypercall buffer: current allocations:0 maximum allocations:2
xc: debug: hypercall buffer: cache current size:2
xc: debug: hypercall buffer: cache hits:34 misses:2 toobig:7

And here is an example of what I'm seeing through xen console on the PV domU after it:


xenium2 ~ # xl console transcendens:
transdendens ~ # [  650.420160] ------------[ cut here ]------------
[ 650.420210] kernel BUG at /usr/src/linux-3.12.21-gentoo-r1/arch/x86/xen/irq.c:105!
[  650.420222] invalid opcode: 0000 [#1] SMP
[ 650.420230] Modules linked in: ipv6 crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd [ 650.420255] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.21-gentoo-r1-domU #1 [ 650.420263] task: ffffffff81c13480 ti: ffffffff81c00000 task.ti: ffffffff81c00000 [ 650.420270] RIP: e030:[<ffffffff81009fc6>] [<ffffffff81009fc6>] xen_safe_halt+0x16/0x20
[  650.420317] RSP: e02b:ffffffff81c01e38  EFLAGS: 00010202
[ 650.420323] RAX: 0000000000000001 RBX: ffffffff81c01fd8 RCX: 0100000000000000 [ 650.420329] RDX: 0140000000000000 RSI: 0000000000000000 RDI: 0000000000000001 [ 650.420335] RBP: ffffffff81c01e38 R08: 0000000000000000 R09: 0000000000000001 [ 650.420341] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000000 [ 650.420347] R13: ffffffff81ca6ae0 R14: ffff88000ff91380 R15: ffffffff81c01fd8 [ 650.420358] FS: 00007f35171a2700(0000) GS:ffff88000fc00000(0000) knlGS:0000000000000000
[  650.420365] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 650.420371] CR2: 00007fff45e01fd0 CR3: 000000000e0ce000 CR4: 0000000000002660
[  650.420377] Stack:
[ 650.420381] ffffffff81c01e58 ffffffff8101b9ef ffffffff81c01fd8 ffffffff81d5c020 [ 650.420389] ffffffff81c01e68 ffffffff8101c1e6 ffffffff81c01ec8 ffffffff8109c1f1 [ 650.420398] ffffffff81d5d040 ffffffff81c01fd8 ffffffff81c01fd8 ffffffff81c01fd8
[  650.420406] Call Trace:
[  650.420416]  [<ffffffff8101b9ef>] default_idle+0x1f/0xb0
[  650.420424]  [<ffffffff8101c1e6>] arch_cpu_idle+0x26/0x30
[  650.420433]  [<ffffffff8109c1f1>] cpu_startup_entry+0x91/0x240
[  650.420447]  [<ffffffff815fdaf2>] rest_init+0x72/0x80
[  650.420456]  [<ffffffff81cc4ea9>] start_kernel+0x39a/0x3a7
[  650.420465]  [<ffffffff81cc490e>] ? repair_env_string+0x5e/0x5e
[ 650.420474] [<ffffffff81099137>] ? __add_preferred_console.constprop.17+0x87/0xb0
[  650.420483]  [<ffffffff81cc45f0>] x86_64_start_reservations+0x2a/0x2c
[  650.420491]  [<ffffffff81cc6db8>] xen_start_kernel+0x552/0x55c
[ 650.420497] Code: f6 e8 7f 72 ff ff eb f3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 bf 01 00 00 00 31 f6 48 89 e5 e8 e0 73 ff ff 85 c0 75 02 5d c3 <0f> 0b 0f 1f 84 00 00 00 00 00 ff 14 25 e0 31 c2 81 f6 c4 02 75
[  650.420556] RIP  [<ffffffff81009fc6>] xen_safe_halt+0x16/0x20
[  650.420565]  RSP <ffffffff81c01e38>
[  650.420620] ---[ end trace 6fdc9098e0910831 ]---
[ 650.420637] Kernel panic - not syncing: Attempted to kill the idle task!

xenium2 destroys crashed domU and drbd device stays in Primary/Primary state:

xenium2 vm-confs # cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
srcversion: C6AD5212ACE6F9812ECF887

 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
ns:400948 nr:2992 dw:403940 dr:273836 al:149 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Log file at var/log/xen/xend.log on receiver's dom0 xenium1:

[2014-08-01 15:06:00 26683] DEBUG (XendDomainInfo:151) XendDomainInfo.recreate({'max_vcpu_id': 0, 'cpu_time': 0L, 'ssidref': 0, 'hvm': 0, 'shutdown_reaso n': 255, 'dying': 0, 'online_vcpus': 1, 'domid': 57, 'paused': 1, 'crashed': 0, 'running': 0, 'maxmem_kb': 263168L, 'shutdown': 0, 'mem_kb': 262144L, 'ha ndle': [243, 230, 24, 162, 202, 174, 65, 68, 143, 176, 107, 114, 12, 6, 120, 249], 'blocked': 0, 'cpupool': 0}) [2014-08-01 15:06:00 26683] INFO (XendDomainInfo:169) Recreating domain 57, UUID f3e618a2-caae-4144-8fb0-6b720c0678f9. at /local/domain/57
[2014-08-01 15:06:00 26683] DEBUG (XendDomain:476) Adding Domain: 57
[2014-08-01 15:06:00 26683] DEBUG (XendDomainInfo:1882) XendDomainInfo.handleShutdownWatch

domU configuration file is:
kernel = "/etc/xen/kernels/kernel-3.12.21-gentoo-r1-domU"
memory = 256
name   = "transcendens"
disk   = [
       'drbd:r0,xvda1,w',
]
root   = "/dev/xvda1 ro"
vif = ['mac=00:16:3e:60:e1:88,bridge=xenbr0']
vcpus=1

I hope it's not something very complicated because migration itself seem working fine. Any clues how this can be fixed?

Thank you,
Konstantin

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.