[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] remus trouble



Hey Brendan & all,

> > I ran into some problems trying remus on xen4.0.1rc4 with the 2.6.31.13
> > dom0 (checkout from yesterday):
> 
> Wat's your domU kernel? pvops support was recently added to dom0, but
> still doesn't work for domU.

Ah, that explains a few things, however similar behaviour occurs with
hvm. Remus starts, spits out the following output:

qemu logdirty mode: enable
 1: sent 267046, skipped 218, delta 8962ms, dom0 68%, target 0%, sent
976Mb/s, dirtied 1Mb/s 290 pages
 2: sent 290, skipped 0, delta 12ms, dom0 66%, target 0%, sent 791Mb/s,
dirtied 43Mb/s 16 pages
 3: sent 16, skipped 0, Start last iteration
PROF: suspending at 1278503125.101352
issuing HVM suspend hypercall
suspend hypercall returned 0
pausing QEMU
SUSPEND shinfo 000fffff
delta 11ms, dom0 18%, target 0%, sent 47Mb/s, dirtied 47Mb/s 16 pages
 4: sent 16, skipped 0, delta 5ms, dom0 20%, target 0%, sent 104Mb/s,
dirtied 104Mb/s 16 pages
Total pages sent= 267368 (0.25x)
(of which 0 were fixups)
All memory is saved
PROF: resumed at 1278503125.111614
resuming QEMU
Sending 6017 bytes of QEMU state
PROF: flushed memory at 1278503125.112014


and then seems to become inactive. ps tree looks like this:

root      4756  0.4  0.1  82740 11040 pts/0    SLl+ 13:45   0:03
/usr/bin/python /usr/bin/remus --no-net remus1 backup


according to strace, it's stuck reading FD6, which is a FIFO file:
/var/run/tap/remus_nas1_9000.msg


the domU comes up in blocked state on the backup machine and seems to
run fine there. however xm list on the primary shows no state whatsoever:

Domain-0                                     0 10208    12     r-----
468.6
remus1                                       1  1024     1     ------
41.8


and after a ctrl-c remus segfaults:
remus[4756]: segfault at 0 ip 00007f3f49cc7376 sp 00007fffec999fd8 error
4 in libc-2.11.1.so[7f3f49ba1000+178000]


> Are these in dom0 or the primary domU? Looks a bit like dom0, but I
> haven't seen these before.

those were in dom0. this time dmesg shows output after destroying
the domU on the primary:

[ 1920.059226] INFO: task xenwatch:55 blocked for more than 120 seconds.
[ 1920.059262] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1920.059315] xenwatch      D 0000000000000000     0    55      2
0x00000000
[ 1920.059363]  ffff8802e2e656c0 0000000000000246 0000000000011200
0000000000000000
[ 1920.059439]  ffff8802e2e65720 0000000000000000 ffff8802d55d20c0
00000001001586b3
[ 1920.059520]  ffff8802e2e683b0 000000000000f668 00000000000153c0
ffff8802e2e683b0
[ 1920.059592] Call Trace:
[ 1920.059626]  [<ffffffff8157553d>] io_schedule+0x2d/0x40
[ 1920.059661]  [<ffffffff812afbc9>] get_request_wait+0xe9/0x1c0
[ 1920.059695]  [<ffffffff810af240>] ? autoremove_wake_function+0x0/0x40
[ 1920.059732]  [<ffffffff812a3e87>] ? elv_merge+0x37/0x200
[ 1920.059765]  [<ffffffff812afd41>] __make_request+0xa1/0x470
[ 1920.059800]  [<ffffffff810389ff>] ? xen_restore_fl_direct_end+0x0/0x1
[ 1920.059837]  [<ffffffff8103ed5d>] ? retint_restore_args+0x5/0x6
[ 1920.059874]  [<ffffffff812ae5dc>] generic_make_request+0x17c/0x4a0
[ 1920.059909]  [<ffffffff8111bdf6>] ? mempool_alloc+0x56/0x140
[ 1920.059946]  [<ffffffff8103819d>] ?
xen_force_evtchn_callback+0xd/0x10
[ 1920.059979]  [<ffffffff812ae978>] submit_bio+0x78/0xf0
[ 1920.060013]  [<ffffffff81180489>] submit_bh+0xf9/0x140
[ 1920.060046]  [<ffffffff81182600>] __block_write_full_page+0x1e0/0x3a0
[ 1920.060080]  [<ffffffff811819c0>] ? end_buffer_async_write+0x0/0x1f0
[ 1920.060116]  [<ffffffff81186980>] ? blkdev_get_block+0x0/0x70
[ 1920.060151]  [<ffffffff81186980>] ? blkdev_get_block+0x0/0x70
[ 1920.060186]  [<ffffffff811819c0>] ? end_buffer_async_write+0x0/0x1f0
[ 1920.060222]  [<ffffffff81182ec1>]
block_write_full_page_endio+0xe1/0x120
[ 1920.060259]  [<ffffffff81038a12>] ? check_events+0x12/0x20
[ 1920.060294]  [<ffffffff81182f15>] block_write_full_page+0x15/0x20
[ 1920.060330]  [<ffffffff81187928>] blkdev_writepage+0x18/0x20
[ 1920.060365]  [<ffffffff81120937>] __writepage+0x17/0x40
[ 1920.060399]  [<ffffffff81121897>] write_cache_pages+0x227/0x4d0
[ 1920.060434]  [<ffffffff81120920>] ? __writepage+0x0/0x40
[ 1920.060469]  [<ffffffff810389ff>] ? xen_restore_fl_direct_end+0x0/0x1
[ 1920.060504]  [<ffffffff81121b64>] generic_writepages+0x24/0x30
[ 1920.060539]  [<ffffffff81121b9d>] do_writepages+0x2d/0x50
[ 1920.060576]  [<ffffffff81119beb>]
__filemap_fdatawrite_range+0x5b/0x60
[ 1920.060613]  [<ffffffff8111a1ff>] filemap_fdatawrite+0x1f/0x30
[ 1920.060646]  [<ffffffff8111a245>] filemap_write_and_wait+0x35/0x50
[ 1920.060681]  [<ffffffff81187ba4>] __sync_blockdev+0x24/0x50
[ 1920.060716]  [<ffffffff81187be3>] sync_blockdev+0x13/0x20
[ 1920.060748]  [<ffffffff81187cc8>] __blkdev_put+0xa8/0x1a0
[ 1920.060784]  [<ffffffff81187dd0>] blkdev_put+0x10/0x20
[ 1920.060819]  [<ffffffff81344fea>] vbd_free+0x2a/0x40
[ 1920.060851]  [<ffffffff81344499>] blkback_remove+0x59/0x90
[ 1920.060885]  [<ffffffff8133e890>] xenbus_dev_remove+0x50/0x70
[ 1920.060921]  [<ffffffff8138b9d8>] __device_release_driver+0x58/0xb0
[ 1920.060956]  [<ffffffff8138bb4d>] device_release_driver+0x2d/0x40
[ 1920.060991]  [<ffffffff8138ac0a>] bus_remove_device+0x9a/0xc0
[ 1920.061027]  [<ffffffff81388da7>] device_del+0x127/0x1d0
[ 1920.061061]  [<ffffffff81388e66>] device_unregister+0x16/0x30
[ 1920.061095]  [<ffffffff813441a0>] frontend_changed+0x90/0x2a0
[ 1920.061131]  [<ffffffff8133eb82>] xenbus_otherend_changed+0xb2/0xc0
[ 1920.061167]  [<ffffffff81577aa7>] ? _spin_unlock_irqrestore+0x37/0x60
[ 1920.061209]  [<ffffffff8133f150>] frontend_changed+0x10/0x20
[ 1920.061243]  [<ffffffff8133c794>] xenwatch_thread+0xb4/0x190
[ 1920.061281]  [<ffffffff810af240>] ? autoremove_wake_function+0x0/0x40
[ 1920.061314]  [<ffffffff8133c6e0>] ? xenwatch_thread+0x0/0x190
[ 1920.061349]  [<ffffffff810aecb6>] kthread+0xa6/0xb0
[ 1920.061383]  [<ffffffff8103f3ea>] child_rip+0xa/0x20
[ 1920.061415]  [<ffffffff8103e5d7>] ? int_ret_from_sys_call+0x7/0x1b
[ 1920.061451]  [<ffffffff8103ed5d>] ? retint_restore_args+0x5/0x6
[ 1920.061485]  [<ffffffff8103f3e0>] ? child_rip+0x0/0x20


Any idea what's going wrong? Thanks!

Cheers,

NN

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.