[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] Task Blocking / Domu Lockups
Hi All, We've been getting DomU's locking up for some time now under moderate IO load (I think) on two different Xen hosts. Everything is Debian - Dom0 is Squeeze and the DomUs are a mixture of Lenny and Squeeze which both crash in the same way. The DomUs and the Dom0 are running the latest Squeeze kernel (2.6.32-5-xen-amd64) and Xen is 4.0.1-2. The block device (or the kernel's handling of it) is probably closer to the cause of the problem than a bug in the individual tasks as you see multiple tasks lock up at the same time if you get enough output and on separate incidents you see different tasks as well. A couple of excerpts from the console are below: [581606.222303] INFO: task syslogd:1142 blocked for more than 120 seconds. [581606.222321] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [581606.222329] syslogd D ffff8800f9eafc78 0 1142 1 [581606.222338] ffff8800f9eafda8 0000000000000286 ffff880003d74fe8 ffff8800060f 18f0 [581606.222349] ffff8800f9e30440 ffff8800d8d8c940 ffff8800f9e306c0 000000000000 0000 [581606.222360] ffff880000000005 0000000000138512 ffff8800f7c41cc0 ffff88000000 000f [581606.222368] Call Trace: [581606.222382] [<ffffffff8022383e>] __wake_up+0x38/0x4f [581606.222395] [<ffffffffa0032067>] :jbd:log_wait_commit+0xb6/0x11f [581606.222403] [<ffffffff8023f64d>] autoremove_wake_function+0x0/0x2e [581606.222413] [<ffffffffa002d552>] :jbd:journal_stop+0x198/0x1f3 [581606.222421] [<ffffffff802a7eec>] __writeback_single_inode+0x1bc/0x2da [581606.222429] [<ffffffff8028a992>] do_readv_writev+0x176/0x18b [581606.222436] [<ffffffff802a898d>] sync_inode+0x24/0x53 [581606.222453] [<ffffffffa003e48a>] :ext3:ext3_sync_file+0x9e/0xb0 [581606.222460] [<ffffffff802aafc6>] do_fsync+0x52/0xa4 [581606.222467] [<ffffffff802ab03b>] __do_fsync+0x23/0x36 [581606.222473] [<ffffffff8020b528>] system_call+0x68/0x6d [581606.222479] [<ffffffff8020b4c0>] system_call+0x0/0x6d [581606.222484] [581376.493333] INFO: task apache2:14097 blocked for more than 120 seconds. [581376.493348] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [581376.493356] apache2 D ffffffff8044af00 0 14097 26200 [581376.493365] ffff8800d0091de0 0000000000000286 0000000000000000 ffff8800f759 aec0 [581376.493375] ffff8800d8f17440 ffffffff804ff460 ffff8800d8f176c0 00000000d009 1e68 [581376.493385] 00000000ffffffff 0000000000000000 ffff880073859000 ffff8800f74a 76c4 [581376.493394] Call Trace: [581376.493408] [<ffffffff8029443f>] path_walk+0x7e/0x8b [581376.493415] [<ffffffff80294733>] do_path_lookup+0x158/0x1ce [581376.493423] [<ffffffff804356ad>] __mutex_lock_slowpath+0x79/0xc7 [581376.493430] [<ffffffff80435482>] mutex_lock+0xa/0xb [581376.493435] [<ffffffff8029542a>] do_filp_open+0x11a/0x7c4 [581376.493445] [<ffffffff80288b3b>] get_unused_fd_flags+0x74/0x13f [581376.493452] [<ffffffff80288c4c>] do_sys_open+0x46/0xc3 [581376.493458] [<ffffffff8020b528>] system_call+0x68/0x6d [581376.493464] [<ffffffff8020b4c0>] system_call+0x0/0x6d [581376.493471] [1426201.768058] INFO: task sshd:772 blocked for more than 120 seconds. [1426201.768058] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1426201.768058] sshd D 0000000000000000 0 772 1 0x00000000 [1426201.768058] ffffffff814791f0 0000000000000282 0000000000000000 ffff88000edc35b0 [1426201.768058] ffff88000edc3690 ffffffff8117fd56 000000000000f9e0 ffff88000edc3fd8 [1426201.768058] 0000000000015780 0000000000015780 ffff88000284f100 ffff88000284f3f8 [1426201.768058] Call Trace: [1426201.768058] [<ffffffff8117fd56>] ? blk_peek_request+0x18b/0x19f [1426201.768058] [<ffffffff8102ddcc>] ? pvclock_clocksource_read+0x3a/0x8b [1426201.768058] [<ffffffff8130c16a>] ? io_schedule+0x73/0xb7 [1426201.768058] [<ffffffff81180b77>] ? get_request_wait+0xf0/0x188 [1426201.768058] [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e [1426201.768058] [<ffffffff81180f06>] ? __make_request+0x2f7/0x428 [1426201.768058] [<ffffffff81192e43>] ? radix_tree_tag_clear+0x93/0xf1 [1426201.768058] [<ffffffff8117f6e3>] ? generic_make_request+0x299/0x2f9 [1426201.768058] [<ffffffff8100e629>] ? xen_force_evtchn_callback+0x9/0xa [1426201.768058] [<ffffffff8100ece2>] ? check_events+0x12/0x20 [1426201.768058] [<ffffffff810bc7ce>] ? __set_page_dirty_nobuffers+0x0/0xfa [1426201.768058] [<ffffffff8117f819>] ? submit_bio+0xd6/0xf2 [1426201.768058] [<ffffffff810bb841>] ? test_set_page_writeback+0xe0/0xef [1426201.768058] [<ffffffff810d9a70>] ? swap_writepage+0x9b/0xa5 [1426201.768058] [<ffffffff810bf3c1>] ? shrink_page_list+0x375/0x623 [1426201.768058] [<ffffffff8100e629>] ? xen_force_evtchn_callback+0x9/0xa [1426201.768058] [<ffffffff8100ece2>] ? check_events+0x12/0x20 [1426201.768058] [<ffffffff810bfda4>] ? shrink_list+0x45c/0x767 [1426201.768058] [<ffffffff81042abe>] ? pick_next_task_fair+0xca/0xd6 [1426201.768058] [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1 [1426201.768058] [<ffffffff8130d42a>] ? _spin_unlock_irqrestore+0xd/0xe [1426201.768058] [<ffffffff8105b8c8>] ? try_to_del_timer_sync+0x63/0x6c [1426201.768058] [<ffffffff810c032f>] ? shrink_zone+0x280/0x342 [1426201.768058] [<ffffffff8130d42a>] ? _spin_unlock_irqrestore+0xd/0xe [1426201.768058] [<ffffffff810c94f8>] ? congestion_wait+0x74/0x80 [1426201.768058] [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e [1426201.768058] [<ffffffff810c13f6>] ? try_to_free_pages+0x232/0x38e [1426201.768058] [<ffffffff810be3eb>] ? isolate_pages_global+0x0/0x20f [1426201.768058] [<ffffffff810fdb83>] ? pollwake+0x0/0x59 [1426201.768058] [<ffffffff810bb484>] ? __alloc_pages_nodemask+0x3cd/0x5f5 [1426201.768058] [<ffffffff810ba60f>] ? __get_free_pages+0x9/0x46 [1426201.768058] [<ffffffff8104d4f6>] ? copy_process+0xd7/0x115f [1426201.768058] [<ffffffff811542f6>] ? cap_d_instantiate+0x0/0x1 [1426201.768058] [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1 [1426201.768058] [<ffffffff8100e629>] ? xen_force_evtchn_callback+0x9/0xa [1426201.768058] [<ffffffff8100ece2>] ? check_events+0x12/0x20 [1426201.768058] [<ffffffff811542f6>] ? cap_d_instantiate+0x0/0x1 [1426201.768058] [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1 [1426201.768058] [<ffffffff8104e6d5>] ? do_fork+0x157/0x31e [1426201.768058] [<ffffffff81118548>] ? inotify_d_instantiate+0x12/0x39 [1426201.768058] [<ffffffff812510d3>] ? sock_attach_fd+0x91/0xbf [1426201.768058] [<ffffffff810ee05f>] ? fd_install+0x2e/0x5a [1426201.768058] [<ffffffff81011e63>] ? stub_clone+0x13/0x20 [1426201.768058] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b [1426201.768058] INFO: task master:845 blocked for more than 120 seconds. [1426201.768058] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1426201.768058] master D 0000000000000000 0 845 1 0x00000000 [1426201.768058] ffffffff814791f0 0000000000000286 0000000000000000 ffff88000ebcd588 [1426201.768058] ffff88000ebcd668 ffffffff8117fd56 000000000000f9e0 ffff88000ebcdfd8 [1426201.768058] 0000000000015780 0000000000015780 ffff88000fd1f810 ffff88000fd1fb08 [1426201.768058] Call Trace: [1426201.768058] [<ffffffff8117fd56>] ? blk_peek_request+0x18b/0x19f [1426201.768058] [<ffffffff8102ddcc>] ? pvclock_clocksource_read+0x3a/0x8b [1426201.768058] [<ffffffff8130c16a>] ? io_schedule+0x73/0xb7 [1426201.768058] [<ffffffff81180b77>] ? get_request_wait+0xf0/0x188 [1426201.768058] [<ffffffff810bee23>] ? move_active_pages_to_lru+0xf3/0x126 [1426201.768058] [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e [1426201.768058] [<ffffffff81180f06>] ? __make_request+0x2f7/0x428 [1426201.768058] [<ffffffff8100ece2>] ? check_events+0x12/0x20 [1426201.768058] [<ffffffff81192e43>] ? radix_tree_tag_clear+0x93/0xf1 [1426201.768058] [<ffffffff8117f6e3>] ? generic_make_request+0x299/0x2f9 [1426201.768058] [<ffffffff8100e629>] ? xen_force_evtchn_callback+0x9/0xa [1426201.768058] [<ffffffff8100ece2>] ? check_events+0x12/0x20 [1426201.768058] [<ffffffff8118f534>] ? cpumask_any_but+0x28/0x34 [1426201.768058] [<ffffffff8117f819>] ? submit_bio+0xd6/0xf2 [1426201.768058] [<ffffffff810bb841>] ? test_set_page_writeback+0xe0/0xef [1426201.768058] [<ffffffff810d9a70>] ? swap_writepage+0x9b/0xa5 [1426201.768058] [<ffffffff810bf3c1>] ? shrink_page_list+0x375/0x623 [1426201.768058] [<ffffffff810bfda4>] ? shrink_list+0x45c/0x767 [1426201.768058] [<ffffffff810bbfd0>] ? determine_dirtyable_memory+0xd/0x1d [1426201.768058] [<ffffffff810bc048>] ? get_dirty_limits+0x1d/0x259 [1426201.768058] [<ffffffffa00380ba>] ? journal_cancel_revoke+0xc3/0xec [jbd] [1426201.768058] [<ffffffff810c032f>] ? shrink_zone+0x280/0x342 [1426201.768058] [<ffffffffa002c226>] ? mb_cache_shrink_fn+0x26/0x129 [mbcache] [1426201.768058] [<ffffffff810c0532>] ? shrink_slab+0x141/0x153 [1426201.768058] [<ffffffff810c13f6>] ? try_to_free_pages+0x232/0x38e [1426201.768058] [<ffffffff810be3eb>] ? isolate_pages_global+0x0/0x20f [1426201.768058] [<ffffffff810bb484>] ? __alloc_pages_nodemask+0x3cd/0x5f5 [1426201.768058] [<ffffffff810cc224>] ? do_wp_page+0x386/0x707 [1426201.768058] [<ffffffff810efa56>] ? do_sync_write+0xce/0x113 [1426201.768058] [<ffffffff8100c3a5>] ? __raw_callee_save_xen_pud_val+0x11/0x1e [1426201.768058] [<ffffffff8100c369>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [1426201.768058] [<ffffffff810cdfc7>] ? handle_mm_fault+0x7aa/0x80f [1426201.768058] [<ffffffff8115421a>] ? cap_cred_commit+0x0/0x1 [1426201.768058] [<ffffffff8130f906>] ? do_page_fault+0x2e0/0x2fc [1426201.768058] [<ffffffff8130d7a5>] ? page_fault+0x25/0x30 Has anybody seen this before? Is there a fix / workaround or should we be trying / building different kernels for the DomUs? Thanks in advance! Regards, Richard Maynard Wessex Networks Linchmere Place Ifield Crawley West Sussex RH11 0EX www.wessexnetworks.com rjm@xxxxxxxxxxxxxxxxxx T: 01293 542080 F: 01293 553849 Twitter: @wessexnetworks _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |