[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Process blocked errors



Hi,

I am currently running Debian Squeeze with stock kernel and Xen apt version.  Before I go into detail, I have also tried Jeremy's kernel with Xen 4.1.1 from xen.org and these errors are still present and or worse with even taking down the whole dom0 (reboots randomly using source version of xen and jeremys kernel).  I also like to add that I am now using clocksource=pit which has fixed the other issues I was having, however the following remains true.

Now from what I can deduce, these errors only apear on some but not all the domU's (guests).  There are no errors on the dom0 (host) itself.  Now I think I maybe on the right track that it seems to be something to do with either Network or Heavy disk IO as the main machines which have these errors are either the VPN server or the BackupPC machine which can cause quite a bit of heavy disk IO.  Most if not all the other domU's dont have any errors at all.  Most guests are running Debain Squeeze also however these errors also apear using Centos domU's also.

I have two seperate servers in two seperate data centers however they both are Supermicro machines using Linux Raid.  First machine is duel Intel(R) Xeon(R) CPU 5140 @ 2.33GHz with 12GB RAM, 4x WD RE3 512GB HDD's with two seperate RAID 1 arrays and the other is a quad Intel(R) Xeon(R) CPU E5410  @ 2.33GHz with 16GB RAM and 2x WD RE 3 1TB drives in RAID 1.  Hardware is simular but the second machine is much newer technology.  I only mention the specs as maybe these issues are related to Supermico machines.

Below you will find the latest logs on these errors...  do note the process it complains about seems random...

[1597440.088347] INFO: task BackupPC:18491 blocked for more than 120 seconds.
[1597440.088354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1597440.088363] BackupPC      D ffff880002dc46a0     0 18491      1 0x00000000
[1597440.088373]  ffff880002dc46a0 0000000000000286 ffff880011ccfd48 ffff880000009680
[1597440.088387]  ffff880011ccfad8 0000000000000000 000000000000f9e0 ffff880011ccffd8
[1597440.088401]  0000000000015780 0000000000015780 ffff8800029d7100 ffff8800029d73f8
[1597440.088415] Call Trace:
[1597440.088422]  [<ffffffff8102ddcc>] ? pvclock_clocksource_read+0x3a/0x8b
[1597440.088430]  [<ffffffff8110f19a>] ? sync_buffer+0x0/0x40
[1597440.088438]  [<ffffffff8110f19a>] ? sync_buffer+0x0/0x40
[1597440.088447]  [<ffffffff8130c16a>] ? io_schedule+0x73/0xb7
[1597440.088455]  [<ffffffff8110f1d5>] ? sync_buffer+0x3b/0x40
[1597440.088463]  [<ffffffff8130d42a>] ? _spin_unlock_irqrestore+0xd/0xe
[1597440.088472]  [<ffffffff8130c57a>] ? __wait_on_bit_lock+0x3f/0x84
[1597440.088480]  [<ffffffff8110f19a>] ? sync_buffer+0x0/0x40
[1597440.088487]  [<ffffffff8130c62a>] ? out_of_line_wait_on_bit_lock+0x6b/0x77
[1597440.088497]  [<ffffffff81065f34>] ? wake_bit_function+0x0/0x23
[1597440.088507]  [<ffffffff8110f5c7>] ? sync_dirty_buffer+0x29/0x93
[1597440.088516]  [<ffffffffa0034e04>] ? journal_dirty_data+0xd1/0x1b0 [jbd]
[1597440.088528]  [<ffffffffa004bf1f>] ? ext3_journal_dirty_data+0xf/0x34 [ext3]
[1597440.088538]  [<ffffffffa004a3f9>] ? walk_page_buffers+0x65/0x8b [ext3]
[1597440.088549]  [<ffffffffa004bf44>] ? journal_dirty_data_fn+0x0/0x13 [ext3]
[1597440.088559]  [<ffffffffa004da66>] ? ext3_ordered_write_end+0x73/0x10f [ext3]
[1597440.088570]  [<ffffffff810b5ea1>] ? generic_file_buffered_write+0x18d/0x278
[1597440.088580]  [<ffffffff810b633d>] ? __generic_file_aio_write+0x25f/0x293
[1597440.088589]  [<ffffffff8118f534>] ? cpumask_any_but+0x28/0x34
[1597440.088598]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
[1597440.088607]  [<ffffffff8100c2f1>] ? __raw_callee_save_xen_pte_val+0x11/0x1e
[1597440.088616]  [<ffffffff810b63ca>] ? generic_file_aio_write+0x59/0x9f
[1597440.088626]  [<ffffffff810efa56>] ? do_sync_write+0xce/0x113
[1597440.088635]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
[1597440.088644]  [<ffffffff810cdfc7>] ? handle_mm_fault+0x7aa/0x80f
[1597440.088654]  [<ffffffff810f03a8>] ? vfs_write+0xa9/0x102
[1597440.088662]  [<ffffffff810f04bd>] ? sys_write+0x45/0x6e
[1597440.088670]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
[1597440.088686] INFO: task flush-202:3:3497 blocked for more than 120 seconds.
[1597440.088694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1597440.088702] flush-202:3   D 0000000000000002     0  3497      2 0x00000000
[1597440.088713]  ffff88001fd89c40 0000000000000246 0000000000000000 ffff880002ea57f8
[1597440.088727]  0000000000000001 0000000000000001 000000000000f9e0 ffff880011d57fd8
[1597440.088740]  0000000000015780 0000000000015780 ffff880002dc0e20 ffff880002dc1118
[1597440.088755] Call Trace:
[1597440.088762]  [<ffffffff8102ddcc>] ? pvclock_clocksource_read+0x3a/0x8b
[1597440.088770]  [<ffffffff8110f19a>] ? sync_buffer+0x0/0x40
[1597440.088779]  [<ffffffff8110f19a>] ? sync_buffer+0x0/0x40
[1597440.088786]  [<ffffffff8130c16a>] ? io_schedule+0x73/0xb7
[1597440.088794]  [<ffffffff8110f1d5>] ? sync_buffer+0x3b/0x40
[1597440.088803]  [<ffffffff8130d42a>] ? _spin_unlock_irqrestore+0xd/0xe
[1597440.088811]  [<ffffffff8130c57a>] ? __wait_on_bit_lock+0x3f/0x84
[1597440.088820]  [<ffffffff8110f19a>] ? sync_buffer+0x0/0x40
[1597440.088828]  [<ffffffff8130c62a>] ? out_of_line_wait_on_bit_lock+0x6b/0x77
[1597440.088837]  [<ffffffff81065f34>] ? wake_bit_function+0x0/0x23
[1597440.088847]  [<ffffffff81110567>] ? __block_write_full_page+0x159/0x2ac
[1597440.088856]  [<ffffffff8110f364>] ? end_buffer_async_write+0x0/0x13b
[1597440.088865]  [<ffffffff810bb6b6>] ? __writepage+0xa/0x25
[1597440.088873]  [<ffffffff810bbd3d>] ? write_cache_pages+0x20b/0x327
[1597440.088881]  [<ffffffff810bb6ac>] ? __writepage+0x0/0x25
[1597440.088889]  [<ffffffff8100b3c5>] ? xen_end_context_switch+0x9/0x12
[1597440.088899]  [<ffffffff81108f1e>] ? writeback_single_inode+0xe7/0x2da
[1597440.088907]  [<ffffffff81109c24>] ? writeback_inodes_wb+0x424/0x4ff
[1597440.088916]  [<ffffffff81109e2b>] ? wb_writeback+0x12c/0x1ab
[1597440.088926]  [<ffffffff8105b8c8>] ? try_to_del_timer_sync+0x63/0x6c
[1597440.088935]  [<ffffffff8110a0a1>] ? wb_do_writeback+0x14f/0x165
[1597440.088944]  [<ffffffff8110a0e8>] ? bdi_writeback_task+0x31/0xaa
[1597440.088953]  [<ffffffff810ca00e>] ? bdi_start_fn+0x0/0xd2
[1597440.088960]  [<ffffffff810ca07e>] ? bdi_start_fn+0x70/0xd2
[1597440.088968]  [<ffffffff810ca00e>] ? bdi_start_fn+0x0/0xd2
[1597440.088975]  [<ffffffff81065c39>] ? kthread+0x79/0x81
[1597440.088983]  [<ffffffff81012baa>] ? child_rip+0xa/0x20
[1597440.088990]  [<ffffffff81011d61>] ? int_ret_from_sys_call+0x7/0x1b
[1597440.088998]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
[1597440.089007]  [<ffffffff81012ba0>] ? child_rip+0x0/0x20
[1597440.089013] INFO: task BackupPC_dump:3498 blocked for more than 120 seconds.
[1597440.089021] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1597440.089028] BackupPC_dump D 0000000000000000     0  3498  18491 0x00000000
[1597440.089038]  ffff88001fd10e20 0000000000000286 0000000000000000 ffff880000009680
[1597440.089050]  0000000000000008 0000000000000000 000000000000f9e0 ffff880017917fd8
[1597440.089062]  0000000000015780 0000000000015780 ffff880002dc2a60 ffff880002dc2d58
[1597440.089075] Call Trace:
[1597440.089081]  [<ffffffff8102ddcc>] ? pvclock_clocksource_read+0x3a/0x8b
[1597440.089089]  [<ffffffff8110f19a>] ? sync_buffer+0x0/0x40
[1597440.089096]  [<ffffffff8130c16a>] ? io_schedule+0x73/0xb7
[1597440.089103]  [<ffffffff8110f1d5>] ? sync_buffer+0x3b/0x40
[1597440.089111]  [<ffffffff8130d42a>] ? _spin_unlock_irqrestore+0xd/0xe
[1597440.089118]  [<ffffffff8130c57a>] ? __wait_on_bit_lock+0x3f/0x84
[1597440.089126]  [<ffffffff8110f19a>] ? sync_buffer+0x0/0x40
[1597440.089133]  [<ffffffff8130c62a>] ? out_of_line_wait_on_bit_lock+0x6b/0x77
[1597440.089141]  [<ffffffff81065f34>] ? wake_bit_function+0x0/0x23
[1597440.089149]  [<ffffffff8110f5c7>] ? sync_dirty_buffer+0x29/0x93
[1597440.089158]  [<ffffffffa0034e04>] ? journal_dirty_data+0xd1/0x1b0 [jbd]
[1597440.092016]  [<ffffffffa004bf1f>] ? ext3_journal_dirty_data+0xf/0x34 [ext3]
[1597440.092016]  [<ffffffffa004a3f9>] ? walk_page_buffers+0x65/0x8b [ext3]
[1597440.092016]  [<ffffffffa004bf44>] ? journal_dirty_data_fn+0x0/0x13 [ext3]
[1597440.092016]  [<ffffffffa004da66>] ? ext3_ordered_write_end+0x73/0x10f [ext3]
[1597440.092016]  [<ffffffff810b5ea1>] ? generic_file_buffered_write+0x18d/0x278
[1597440.092016]  [<ffffffff810b633d>] ? __generic_file_aio_write+0x25f/0x293
[1597440.092016]  [<ffffffff8118f534>] ? cpumask_any_but+0x28/0x34
[1597440.092016]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
[1597440.092016]  [<ffffffff8100c2f1>] ? __raw_callee_save_xen_pte_val+0x11/0x1e
[1597440.092016]  [<ffffffff810b63ca>] ? generic_file_aio_write+0x59/0x9f
[1597440.092016]  [<ffffffff810efa56>] ? do_sync_write+0xce/0x113
[1597440.092016]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
[1597440.092016]  [<ffffffff810cdfc7>] ? handle_mm_fault+0x7aa/0x80f
[1597440.092016]  [<ffffffff810f03a8>] ? vfs_write+0xa9/0x102
[1597440.092016]  [<ffffffff810f04bd>] ? sys_write+0x45/0x6e
[1597440.092016]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
[1597800.096045] INFO: task kswapd0:30 blocked for more than 120 seconds.
[1597800.096060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1597800.096069] kswapd0       D 0000000000000000     0    30      2 0x00000000
[1597800.096079]  ffffffff814791f0 0000000000000246 0000000000000000 ffff88001d8736b0
[1597800.096093]  ffff88001d873790 ffffffff8117fd56 000000000000f9e0 ffff88001d873fd8
[1597800.096106]  0000000000015780 0000000000015780 ffff88001fd8cdb0 ffff88001fd8d0a8
[1597800.096120] Call Trace:
[1597800.096134]  [<ffffffff8117fd56>] ? blk_peek_request+0x18b/0x19f
[1597800.096145]  [<ffffffff8102ddcc>] ? pvclock_clocksource_read+0x3a/0x8b
[1597800.096156]  [<ffffffff8130c16a>] ? io_schedule+0x73/0xb7
[1597800.096165]  [<ffffffff81180b77>] ? get_request_wait+0xf0/0x188
[1597800.096175]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
[1597800.096184]  [<ffffffff81180f06>] ? __make_request+0x2f7/0x428
[1597800.096193]  [<ffffffff8117f6e3>] ? generic_make_request+0x299/0x2f9
[1597800.096204]  [<ffffffff81193109>] ? radix_tree_delete+0xbf/0x1ba
[1597800.096214]  [<ffffffff8100ece2>] ? check_events+0x12/0x20
[1597800.096223]  [<ffffffff8117f819>] ? submit_bio+0xd6/0xf2
[1597800.096232]  [<ffffffff8110e069>] ? submit_bh+0x103/0x123
[1597800.096242]  [<ffffffff811105e4>] ? __block_write_full_page+0x1d6/0x2ac
[1597800.096250]  [<ffffffff8110f364>] ? end_buffer_async_write+0x0/0x13b
[1597800.096260]  [<ffffffff81112670>] ? blkdev_get_block+0x0/0x57
[1597800.096272]  [<ffffffff810bf3c1>] ? shrink_page_list+0x375/0x623
[1597800.096281]  [<ffffffff810bfda4>] ? shrink_list+0x45c/0x767
[1597800.096290]  [<ffffffff810bbfd0>] ? determine_dirtyable_memory+0xd/0x1d
[1597800.096299]  [<ffffffff810bc048>] ? get_dirty_limits+0x1d/0x259
[1597800.096308]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
[1597800.096319]  [<ffffffff81099108>] ? __call_rcu+0x110/0x118
[1597800.096329]  [<ffffffff810fe2ab>] ? d_kill+0x58/0x61
[1597800.096338]  [<ffffffff810c032f>] ? shrink_zone+0x280/0x342
[1597800.096351]  [<ffffffffa002c226>] ? mb_cache_shrink_fn+0x26/0x129 [mbcache]
[1597800.096361]  [<ffffffff810c0d54>] ? kswapd+0x4b9/0x686
[1597800.096369]  [<ffffffff810be3eb>] ? isolate_pages_global+0x0/0x20f
[1597800.096379]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
[1597800.096388]  [<ffffffff8100ece2>] ? check_events+0x12/0x20
[1597800.096396]  [<ffffffff810c089b>] ? kswapd+0x0/0x686
[1597800.096405]  [<ffffffff81065c39>] ? kthread+0x79/0x81
[1597800.096414]  [<ffffffff81012baa>] ? child_rip+0xa/0x20
[1597800.096422]  [<ffffffff81011d61>] ? int_ret_from_sys_call+0x7/0x1b
[1597800.096431]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
[1597800.096439]  [<ffffffff81012ba0>] ? child_rip+0x0/0x20
[1597800.096449] INFO: task kjournald:386 blocked for more than 120 seconds.
[1597800.096456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1597800.096464] kjournald     D 0000000000000000     0   386      2 0x00000000
[1597800.096475]  ffffffff814791f0 0000000000000246 0000000000000000 0000000000000200
[1597800.096489]  0000000000000000 0000000000000001 000000000000f9e0 ffff88001e599fd8
[1597800.096504]  0000000000015780 0000000000015780 ffff880002fa5bd0 ffff880002fa5ec8
[1597800.096518] Call Trace:
[1597800.096525]  [<ffffffff8102ddcc>] ? pvclock_clocksource_read+0x3a/0x8b
[1597800.096534]  [<ffffffff8110f19a>] ? sync_buffer+0x0/0x40
[1597800.096542]  [<ffffffff8130c16a>] ? io_schedule+0x73/0xb7
[1597800.096551]  [<ffffffff8110f1d5>] ? sync_buffer+0x3b/0x40
[1597800.096559]  [<ffffffff8130d42a>] ? _spin_unlock_irqrestore+0xd/0xe
[1597800.096568]  [<ffffffff8130c677>] ? __wait_on_bit+0x41/0x70
[1597800.096577]  [<ffffffff8110f19a>] ? sync_buffer+0x0/0x40
[1597800.096585]  [<ffffffff8130c711>] ? out_of_line_wait_on_bit+0x6b/0x77
[1597800.096594]  [<ffffffff81065f34>] ? wake_bit_function+0x0/0x23
[1597800.096605]  [<ffffffffa00361d1>] ? journal_commit_transaction+0x508/0xe2b [jbd]
[1597800.096616]  [<ffffffff8100e629>] ? xen_force_evtchn_callback+0x9/0xa
[1597800.096625]  [<ffffffff8100ece2>] ? check_events+0x12/0x20
[1597800.096633]  [<ffffffff8130d42a>] ? _spin_unlock_irqrestore+0xd/0xe
[1597800.096643]  [<ffffffff8130d42a>] ? _spin_unlock_irqrestore+0xd/0xe
[1597800.096652]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
[1597800.096661]  [<ffffffff8130d42a>] ? _spin_unlock_irqrestore+0xd/0xe
[1597800.096671]  [<ffffffffa0039423>] ? kjournald+0xdf/0x226 [jbd]
[1597800.096680]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
[1597800.096690]  [<ffffffffa0039344>] ? kjournald+0x0/0x226 [jbd]
[1597800.096699]  [<ffffffff81065c39>] ? kthread+0x79/0x81
[1597800.096707]  [<ffffffff81012baa>] ? child_rip+0xa/0x20
[1597800.096715]  [<ffffffff81011d61>] ? int_ret_from_sys_call+0x7/0x1b
[1597800.096723]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
[1597800.096732]  [<ffffffff81012ba0>] ? child_rip+0x0/0x20
[1597800.096751] INFO: task flush-202:3:3497 blocked for more than 120 seconds.
[1597800.096759] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1597800.096767] flush-202:3   D ffff88001ff969f0     0  3497      2 0x00000000
[1597800.096778]  ffff88001ff969f0 0000000000000246 ffff880011d578a0 ffff880011d5789c
[1597800.096792]  ffff880011d57920 ffffffff8117fd56 000000000000f9e0 ffff880011d57fd8
[1597800.096807]  0000000000015780 0000000000015780 ffff880002dc0e20 ffff880002dc1118
[1597800.096821] Call Trace:
[1597800.096829]  [<ffffffff8117fd56>] ? blk_peek_request+0x18b/0x19f
[1597800.096838]  [<ffffffff8102ddcc>] ? pvclock_clocksource_read+0x3a/0x8b
[1597800.096846]  [<ffffffff8130c16a>] ? io_schedule+0x73/0xb7
[1597800.096856]  [<ffffffff81180b77>] ? get_request_wait+0xf0/0x188
[1597800.096864]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
[1597800.096872]  [<ffffffff81180f06>] ? __make_request+0x2f7/0x428
[1597800.096880]  [<ffffffff8117f6e3>] ? generic_make_request+0x299/0x2f9
[1597800.096890]  [<ffffffffa000a43b>] ? do_blkif_request+0x0/0x374 [xen_blkfront]
[1597800.096899]  [<ffffffff8100ece2>] ? check_events+0x12/0x20
[1597800.096907]  [<ffffffff8117f819>] ? submit_bio+0xd6/0xf2
[1597800.096914]  [<ffffffff8110e069>] ? submit_bh+0x103/0x123
[1597800.096922]  [<ffffffff811105e4>] ? __block_write_full_page+0x1d6/0x2ac
[1597800.096930]  [<ffffffff8100ece2>] ? check_events+0x12/0x20
[1597800.096938]  [<ffffffff8110f364>] ? end_buffer_async_write+0x0/0x13b
[1597800.096947]  [<ffffffff81112670>] ? blkdev_get_block+0x0/0x57
[1597800.096955]  [<ffffffff810bb6b6>] ? __writepage+0xa/0x25
[1597800.096962]  [<ffffffff810bbd3d>] ? write_cache_pages+0x20b/0x327
[1597800.096970]  [<ffffffff810bb6ac>] ? __writepage+0x0/0x25
[1597800.096979]  [<ffffffff81108f1e>] ? writeback_single_inode+0xe7/0x2da
[1597800.096987]  [<ffffffff81109c24>] ? writeback_inodes_wb+0x424/0x4ff
[1597800.096995]  [<ffffffff81109e2b>] ? wb_writeback+0x12c/0x1ab
[1597800.097006]  [<ffffffff8105b8c8>] ? try_to_del_timer_sync+0x63/0x6c
[1597800.097014]  [<ffffffff8110a0a1>] ? wb_do_writeback+0x14f/0x165
[1597800.097022]  [<ffffffff8110a0e8>] ? bdi_writeback_task+0x31/0xaa
[1597800.097031]  [<ffffffff810ca00e>] ? bdi_start_fn+0x0/0xd2
[1597800.097038]  [<ffffffff810ca07e>] ? bdi_start_fn+0x70/0xd2
[1597800.097045]  [<ffffffff810ca00e>] ? bdi_start_fn+0x0/0xd2
[1597800.097052]  [<ffffffff81065c39>] ? kthread+0x79/0x81
[1597800.097060]  [<ffffffff81012baa>] ? child_rip+0xa/0x20
[1597800.097067]  [<ffffffff81011d61>] ? int_ret_from_sys_call+0x7/0x1b
[1597800.097074]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
[1597800.100015]  [<ffffffff81012ba0>] ? child_rip+0x0/0x20

Can anyone give me any clues to what the problem is and or how to fix them.

Thanks in advanced
-- 
May the ping be with you ..
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.