[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] io hang with lvm on md raid1



HI there

Seeing an issue across multiple hosts when copying content from a lvm snapshot. The cp command appears to hang indefinitely and can not be killed (state D). Other commands that require IO (eg lvs, dd, touch $file) may get queued, eventually depending on how busy the host is but normally within 48 hours all io on the host blocks and the host becomes completely unresponsive.
Machines are all stock Centos6 using xen packages from xen.crc.id.au. I 
also have a report posted on https://xen.crc.id.au/bugs/view.php?id=75
IO stack is always spindle drives -> md raid 1, lvm, lv. In this case 
the cp target was a sparse image on a separate raid1 drive array (on 
different drives), but that varies between incidents
We also have raid6 on some hosts, but to date have not seen this issue 
occur on that which is suggestive of raid1 as the problem.
Although the problem commands are not directly xen related, posting this 
here first before asking other subsystem lists due to xen hypercalls 
showing up in the traces below.
Has anyone seen anything like this recently? Or have any insight as to 
what might be causing this? Or perhaps suggest some ways I might debug 
this to provide further useful details?
Output from host of "echo w > /proc/sysrq-trigger". We also have 't' 
output if needed.
<6>1 2016-10-04T00:35:42.140955+00:00 host kernel - - sysrq: SysRq : 
Show Blocked State
<6>1 2016-10-04T00:35:42.140969+00:00 host kernel - - task PC stack pid 
father
<6>1 2016-10-04T00:35:42.140971+00:00 host kernel - - dmeventd D 
ffff880051cd3a38 0 24754 1 0x00000000
<4>1 2016-10-04T00:35:42.140973+00:00 host kernel - - ffff880051cd3a38 
ffff88005caf8140 ffff8800021fa5c0 0000000000000000
<4>1 2016-10-04T00:35:42.140975+00:00 host kernel - - ffff88006424a210 
0000000000000000 ffff880051cd39f0 ffffffff810061a9
<4>1 2016-10-04T00:35:42.140976+00:00 host kernel - - ffff88006424a210 
ffff88006424aa10 ffff88006424a210 ffff88006424aa10
<4>1 2016-10-04T00:35:42.140977+00:00 host kernel - - Call Trace:
<4>1 2016-10-04T00:35:42.140979+00:00 host kernel - - [<ffffffff810061a9>] ? xen_load_sp0+0xc9/0x1d0 <4>1 2016-10-04T00:35:42.140982+00:00 host kernel - - [<ffffffff81006cbd>] ? xen_mc_flush+0xad/0x1b0 <4>1 2016-10-04T00:35:42.140985+00:00 host kernel - - [<ffffffff810a0ac4>] ? finish_task_switch+0xa4/0x240 <4>1 2016-10-04T00:35:42.140986+00:00 host kernel - - [<ffffffff81681e20>] schedule+0x40/0x90 <4>1 2016-10-04T00:35:42.140987+00:00 host kernel - - [<ffffffff816845ad>] rwsem_down_write_failed+0x1fd/0x360 <4>1 2016-10-04T00:35:42.140988+00:00 host kernel - - [<ffffffff81681586>] ? __schedule+0x306/0xa30 <4>1 2016-10-04T00:35:42.140990+00:00 host kernel - - [<ffffffff8133ff53>] call_rwsem_down_write_failed+0x13/0x20 <4>1 2016-10-04T00:35:42.140991+00:00 host kernel - - [<ffffffff81683d84>] ? down_write+0x24/0x40 <4>1 2016-10-04T00:35:42.140992+00:00 host kernel - - [<ffffffffa02ef25f>] snapshot_status+0x2f/0x190 [dm_snapshot] <4>1 2016-10-04T00:35:42.140994+00:00 host kernel - - [<ffffffffa0009193>] retrieve_status+0xb3/0x1d0 [dm_mod] <4>1 2016-10-04T00:35:42.140995+00:00 host kernel - - [<ffffffffa00092b0>] ? retrieve_status+0x1d0/0x1d0 [dm_mod] <4>1 2016-10-04T00:35:42.140996+00:00 host kernel - - [<ffffffffa000930e>] table_status+0x5e/0x90 [dm_mod] <4>1 2016-10-04T00:35:42.140997+00:00 host kernel - - [<ffffffffa000aa03>] ctl_ioctl+0x1d3/0x410 [dm_mod] <4>1 2016-10-04T00:35:42.140999+00:00 host kernel - - [<ffffffffa000ac53>] dm_ctl_ioctl+0x13/0x20 [dm_mod] <4>1 2016-10-04T00:35:42.141000+00:00 host kernel - - [<ffffffff811e3048>] do_vfs_ioctl+0x88/0x4b0 <4>1 2016-10-04T00:35:42.141001+00:00 host kernel - - [<ffffffff811ec92c>] ? __fget_light+0x2c/0x70 <4>1 2016-10-04T00:35:42.141002+00:00 host kernel - - [<ffffffff810882d5>] ? __set_current_blocked+0x55/0x60 <4>1 2016-10-04T00:35:42.141004+00:00 host kernel - - [<ffffffff811e3502>] SyS_ioctl+0x92/0xa0 <4>1 2016-10-04T00:35:42.141005+00:00 host kernel - - [<ffffffff81003615>] ? syscall_return_slowpath+0x65/0x70 <4>1 2016-10-04T00:35:42.141006+00:00 host kernel - - [<ffffffff816856ee>] entry_SYSCALL_64_fastpath+0x12/0x71
<6>1 2016-10-04T00:35:42.141008+00:00 host kernel - - kworker/1:2 D 
ffff8800324139f8 0 23596 2 0x00000000
<6>1 2016-10-04T00:35:42.141009+00:00 host kernel - - Workqueue: kcopyd 
do_work [dm_mod]
<4>1 2016-10-04T00:35:42.141010+00:00 host kernel - - ffff8800324139f8 
ffff88005caf8140 ffff88005b576180 ffffffff816879c9
<4>1 2016-10-04T00:35:42.141012+00:00 host kernel - - 0000000000000001 
0000000000000001 ffff88006424da80 00000001aa7d9a84
<4>1 2016-10-04T00:35:42.141013+00:00 host kernel - - ffff88006424da80 
ffffffff816879c9 0000000000000001 0000000000000001
<4>1 2016-10-04T00:35:42.141014+00:00 host kernel - - Call Trace:
<4>1 2016-10-04T00:35:42.141015+00:00 host kernel - - [<ffffffff816879c9>] ? error_exit+0x9/0x20 <4>1 2016-10-04T00:35:42.141016+00:00 host kernel - - [<ffffffff816879c9>] ? error_exit+0x9/0x20 <4>1 2016-10-04T00:35:42.141017+00:00 host kernel - - [<ffffffff816853cf>] ? _raw_spin_lock_irqsave+0x1f/0x50 <4>1 2016-10-04T00:35:42.141018+00:00 host kernel - - [<ffffffff810d83ea>] ? lock_timer_base+0x5a/0x80 <4>1 2016-10-04T00:35:42.141019+00:00 host kernel - - [<ffffffff816850f6>] ? _raw_spin_unlock_irqrestore+0x16/0x20 <4>1 2016-10-04T00:35:42.141020+00:00 host kernel - - [<ffffffff816853cf>] ? _raw_spin_lock_irqsave+0x1f/0x50 <4>1 2016-10-04T00:35:42.141022+00:00 host kernel - - [<ffffffff81681e20>] schedule+0x40/0x90 <4>1 2016-10-04T00:35:42.141023+00:00 host kernel - - [<ffffffff816848a8>] schedule_timeout+0x118/0x1f0 <4>1 2016-10-04T00:35:42.141024+00:00 host kernel - - [<ffffffff810d8410>] ? lock_timer_base+0x80/0x80 <4>1 2016-10-04T00:35:42.141025+00:00 host kernel - - [<ffffffff8100d05d>] ? xen_force_evtchn_callback+0xd/0x10 <4>1 2016-10-04T00:35:42.141026+00:00 host kernel - - [<ffffffff8100d822>] ? check_events+0x12/0x20 <4>1 2016-10-04T00:35:42.141027+00:00 host kernel - - [<ffffffff8168499e>] schedule_timeout_uninterruptible+0x1e/0x20 <4>1 2016-10-04T00:35:42.141029+00:00 host kernel - - [<ffffffff810da4cc>] msleep+0x1c/0x30 <4>1 2016-10-04T00:35:42.141030+00:00 host kernel - - [<ffffffffa02eff22>] __check_for_conflicting_io+0x62/0x80 [dm_snapshot] <4>1 2016-10-04T00:35:42.141031+00:00 host kernel - - [<ffffffffa02f0171>] pending_complete+0x231/0x270 [dm_snapshot] <4>1 2016-10-04T00:35:42.141032+00:00 host kernel - - [<ffffffffa02f1f60>] persistent_commit_exception+0xb0/0x140 [dm_snapshot] <4>1 2016-10-04T00:35:42.141034+00:00 host kernel - - [<ffffffffa02ee12c>] copy_callback+0x10c/0x130 [dm_snapshot] <4>1 2016-10-04T00:35:42.141035+00:00 host kernel - - [<ffffffff8115ce43>] ? mempool_free+0x33/0x90 <4>1 2016-10-04T00:35:42.141036+00:00 host kernel - - [<ffffffffa02ee020>] ? dm_snap_cow+0x10/0x10 [dm_snapshot] <4>1 2016-10-04T00:35:42.141037+00:00 host kernel - - [<ffffffffa000c0f5>] run_complete_job+0x95/0xe0 [dm_mod] <4>1 2016-10-04T00:35:42.141038+00:00 host kernel - - [<ffffffffa000bd56>] process_jobs+0x76/0x110 [dm_mod] <4>1 2016-10-04T00:35:42.141039+00:00 host kernel - - [<ffffffffa000c060>] ? dispatch_job+0x70/0x70 [dm_mod] <4>1 2016-10-04T00:35:42.141040+00:00 host kernel - - [<ffffffffa000be2f>] do_work+0x3f/0x80 [dm_mod] <4>1 2016-10-04T00:35:42.141041+00:00 host kernel - - [<ffffffff81093bc5>] process_one_work+0x165/0x500 <4>1 2016-10-04T00:35:42.141042+00:00 host kernel - - [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20 <4>1 2016-10-04T00:35:42.141044+00:00 host kernel - - [<ffffffff81094993>] worker_thread+0x133/0x610 <4>1 2016-10-04T00:35:42.141045+00:00 host kernel - - [<ffffffff810a2dc2>] ? default_wake_function+0x12/0x20 <4>1 2016-10-04T00:35:42.141046+00:00 host kernel - - [<ffffffff810b6546>] ? __wake_up_common+0x56/0x90 <4>1 2016-10-04T00:35:42.141047+00:00 host kernel - - [<ffffffff81094860>] ? create_worker+0x1e0/0x1e0 <4>1 2016-10-04T00:35:42.141048+00:00 host kernel - - [<ffffffff81094860>] ? create_worker+0x1e0/0x1e0 <4>1 2016-10-04T00:35:42.141049+00:00 host kernel - - [<ffffffff810993ac>] kthread+0xcc/0xf0 <4>1 2016-10-04T00:35:42.141050+00:00 host kernel - - [<ffffffff810a1b1e>] ? schedule_tail+0x1e/0xc0 <4>1 2016-10-04T00:35:42.141052+00:00 host kernel - - [<ffffffff810992e0>] ? kthread_freezable_should_stop+0x70/0x70 <4>1 2016-10-04T00:35:42.141053+00:00 host kernel - - [<ffffffff81685a4f>] ret_from_fork+0x3f/0x70 <4>1 2016-10-04T00:35:42.141054+00:00 host kernel - - [<ffffffff810992e0>] ? kthread_freezable_should_stop+0x70/0x70
<6>1 2016-10-04T00:35:42.141055+00:00 host kernel - - cp D 
ffff88001b88b5e8 0 24931 20597 0x00000004
<4>1 2016-10-04T00:35:42.141056+00:00 host kernel - - ffff88001b88b5e8 
ffffffff818b9500 ffff8800021f2480 ffff88005fc01340
<4>1 2016-10-04T00:35:42.141058+00:00 host kernel - - ffff88007d3ddd80 
ffff88007d3d0250 0000000002420440 ffff88001b88b6e8
<4>1 2016-10-04T00:35:42.141087+00:00 host kernel - - ffff88007d3dd6c0 
ffff88005cafa600 ffff880059cfa560 ffff8800571cd009
<4>1 2016-10-04T00:35:42.141089+00:00 host kernel - - Call Trace:
<4>1 2016-10-04T00:35:42.141090+00:00 host kernel - - [<ffffffff81336cad>] ? radix_tree_lookup+0xd/0x10 <4>1 2016-10-04T00:35:42.141091+00:00 host kernel - - [<ffffffff810c8577>] ? irq_to_desc+0x17/0x20 <4>1 2016-10-04T00:35:42.141093+00:00 host kernel - - [<ffffffff810cbd0e>] ? irq_get_irq_data+0xe/0x20 <4>1 2016-10-04T00:35:42.141094+00:00 host kernel - - [<ffffffff81162123>] ? __rmqueue+0x2f3/0x480 <4>1 2016-10-04T00:35:42.141095+00:00 host kernel - - [<ffffffff813f7433>] ? xen_send_IPI_one+0x33/0x60 <4>1 2016-10-04T00:35:42.141097+00:00 host kernel - - [<ffffffff81681e20>] schedule+0x40/0x90 <4>1 2016-10-04T00:35:42.141098+00:00 host kernel - - [<ffffffff816845ad>] rwsem_down_write_failed+0x1fd/0x360 <4>1 2016-10-04T00:35:42.141099+00:00 host kernel - - [<ffffffff811b58d2>] ? kmem_cache_alloc+0x72/0x160 <4>1 2016-10-04T00:35:42.141101+00:00 host kernel - - [<ffffffff8133ff53>] call_rwsem_down_write_failed+0x13/0x20 <4>1 2016-10-04T00:35:42.141102+00:00 host kernel - - [<ffffffff81683d84>] ? down_write+0x24/0x40 <4>1 2016-10-04T00:35:42.141104+00:00 host kernel - - [<ffffffffa02f0c20>] snapshot_map+0x90/0x490 [dm_snapshot] <4>1 2016-10-04T00:35:42.141105+00:00 host kernel - - [<ffffffffa000427a>] __map_bio+0x4a/0x130 [dm_mod] <4>1 2016-10-04T00:35:42.141106+00:00 host kernel - - [<ffffffffa0004867>] __split_and_process_bio+0x327/0x3f0 [dm_mod] <4>1 2016-10-04T00:35:42.141108+00:00 host kernel - - [<ffffffffa00049a4>] dm_make_request+0x74/0xe0 [dm_mod] <4>1 2016-10-04T00:35:42.141109+00:00 host kernel - - [<ffffffff8130922f>] generic_make_request+0xff/0x1d0 <4>1 2016-10-04T00:35:42.141111+00:00 host kernel - - [<ffffffff81309370>] submit_bio+0x70/0x140 <4>1 2016-10-04T00:35:42.141112+00:00 host kernel - - [<ffffffff8120eed4>] mpage_bio_submit+0x34/0x50 <4>1 2016-10-04T00:35:42.141113+00:00 host kernel - - [<ffffffff8120f2c3>] do_mpage_readpage+0x2b3/0x6d0 <4>1 2016-10-04T00:35:42.141115+00:00 host kernel - - [<ffffffff8116b34f>] ? __lru_cache_add+0x5f/0x80 <4>1 2016-10-04T00:35:42.141116+00:00 host kernel - - [<ffffffff8116b37e>] ? lru_cache_add+0xe/0x10 <4>1 2016-10-04T00:35:42.141118+00:00 host kernel - - [<ffffffff8120f874>] mpage_readpages+0x114/0x160 <4>1 2016-10-04T00:35:42.141119+00:00 host kernel - - [<ffffffff812096f0>] ? I_BDEV+0x20/0x20 <4>1 2016-10-04T00:35:42.141120+00:00 host kernel - - [<ffffffff812096f0>] ? I_BDEV+0x20/0x20 <4>1 2016-10-04T00:35:42.141121+00:00 host kernel - - [<ffffffff81208202>] ? block_write_end+0x42/0x90 <4>1 2016-10-04T00:35:42.141123+00:00 host kernel - - [<ffffffff8115ad6b>] ? __page_cache_alloc+0xcb/0x110 <4>1 2016-10-04T00:35:42.141124+00:00 host kernel - - [<ffffffff8154d56c>] ? mddev_congested+0x2c/0x40 <4>1 2016-10-04T00:35:42.141125+00:00 host kernel - - [<ffffffff81209fdd>] blkdev_readpages+0x1d/0x20 <4>1 2016-10-04T00:35:42.141127+00:00 host kernel - - [<ffffffff81168d60>] __do_page_cache_readahead+0x1a0/0x240 <4>1 2016-10-04T00:35:42.141128+00:00 host kernel - - [<ffffffff81168f4d>] ondemand_readahead+0x14d/0x250 <4>1 2016-10-04T00:35:42.141130+00:00 host kernel - - [<ffffffff811f8fa4>] ? inode_congested+0xa4/0x100 <4>1 2016-10-04T00:35:42.141131+00:00 host kernel - - [<ffffffff811690c2>] page_cache_async_readahead+0x72/0x80 <4>1 2016-10-04T00:35:42.141132+00:00 host kernel - - [<ffffffff8115c62e>] generic_file_read_iter+0x40e/0x5e0 <4>1 2016-10-04T00:35:42.141134+00:00 host kernel - - [<ffffffff81209ac7>] blkdev_read_iter+0x37/0x40 <4>1 2016-10-04T00:35:42.141135+00:00 host kernel - - [<ffffffff811d0f7c>] __vfs_read+0xcc/0xf0 <4>1 2016-10-04T00:35:42.141136+00:00 host kernel - - [<ffffffff811d122e>] vfs_read+0x8e/0xe0 <4>1 2016-10-04T00:35:42.141138+00:00 host kernel - - [<ffffffff811ecdb2>] ? __fdget_pos+0x12/0x50 <4>1 2016-10-04T00:35:42.141139+00:00 host kernel - - [<ffffffff811d1ae6>] SyS_read+0x56/0xc0 <4>1 2016-10-04T00:35:42.141140+00:00 host kernel - - [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20 <4>1 2016-10-04T00:35:42.141142+00:00 host kernel - - [<ffffffff816856ee>] entry_SYSCALL_64_fastpath+0x12/0x71 <4>1 2016-10-04T00:35:42.141143+00:00 host kernel - - [<ffffffff8100d05d>] ? xen_force_evtchn_callback+0xd/0x10
<6>1 2016-10-04T00:35:42.141144+00:00 host kernel - - lvs D 
ffff88004bd2fa38 0 22800 22771 0x00000004
<4>1 2016-10-04T00:35:42.141145+00:00 host kernel - - ffff88004bd2fa38 
ffff88005caf8140 ffff88002f7e4640 0000000000000001
<4>1 2016-10-04T00:35:42.141146+00:00 host kernel - - ffff88007d3ddb50 
000001004bd2c008 ffff88007d3ddb48 ffffffffffffffff
<4>1 2016-10-04T00:35:42.141147+00:00 host kernel - - ffff88007d3ddc48 
00ff88007d3ddd80 ffff88007d3deb10 0000000000000000
<4>1 2016-10-04T00:35:42.141148+00:00 host kernel - - Call Trace:
<4>1 2016-10-04T00:35:42.141149+00:00 host kernel - - [<ffffffff811634df>] ? __alloc_pages_nodemask+0x17f/0xbc0 <4>1 2016-10-04T00:35:42.141151+00:00 host kernel - - [<ffffffff810b6a7f>] ? __wake_up_sync_key+0x5f/0x80 <4>1 2016-10-04T00:35:42.141152+00:00 host kernel - - [<ffffffff811ea0f9>] ? address_space_init_once+0x39/0x70 <4>1 2016-10-04T00:35:42.141153+00:00 host kernel - - [<ffffffff811ea196>] ? inode_init_once+0x66/0x80 <4>1 2016-10-04T00:35:42.141154+00:00 host kernel - - [<ffffffff81681e20>] schedule+0x40/0x90 <4>1 2016-10-04T00:35:42.141155+00:00 host kernel - - [<ffffffff816845ad>] rwsem_down_write_failed+0x1fd/0x360 <4>1 2016-10-04T00:35:42.141157+00:00 host kernel - - [<ffffffff811ead01>] ? inode_sb_list_add+0x21/0x70 <4>1 2016-10-04T00:35:42.141158+00:00 host kernel - - [<ffffffff810b6bef>] ? wake_up_bit+0x2f/0x40 <4>1 2016-10-04T00:35:42.141159+00:00 host kernel - - [<ffffffff811e720d>] ? d_rehash+0x4d/0x60 <4>1 2016-10-04T00:35:42.141161+00:00 host kernel - - [<ffffffff8133ff53>] call_rwsem_down_write_failed+0x13/0x20 <4>1 2016-10-04T00:35:42.141162+00:00 host kernel - - [<ffffffff81683d84>] ? down_write+0x24/0x40 <4>1 2016-10-04T00:35:42.141163+00:00 host kernel - - [<ffffffffa02ef25f>] snapshot_status+0x2f/0x190 [dm_snapshot] <4>1 2016-10-04T00:35:42.141165+00:00 host kernel - - [<ffffffffa0009193>] retrieve_status+0xb3/0x1d0 [dm_mod] <4>1 2016-10-04T00:35:42.141166+00:00 host kernel - - [<ffffffffa00092b0>] ? retrieve_status+0x1d0/0x1d0 [dm_mod] <4>1 2016-10-04T00:35:42.141168+00:00 host kernel - - [<ffffffffa000930e>] table_status+0x5e/0x90 [dm_mod] <4>1 2016-10-04T00:35:42.141169+00:00 host kernel - - [<ffffffffa000aa03>] ctl_ioctl+0x1d3/0x410 [dm_mod] <4>1 2016-10-04T00:35:42.141391+00:00 host kernel - - [<ffffffff811d2f59>] ? __fput+0x149/0x1f0 <4>1 2016-10-04T00:35:42.141394+00:00 host kernel - - [<ffffffffa000ac53>] dm_ctl_ioctl+0x13/0x20 [dm_mod] <4>1 2016-10-04T00:35:42.141395+00:00 host kernel - - [<ffffffff811e3048>] do_vfs_ioctl+0x88/0x4b0 <4>1 2016-10-04T00:35:42.141397+00:00 host kernel - - [<ffffffff8100325b>] ? exit_to_usermode_loop+0x7b/0xd0 <4>1 2016-10-04T00:35:42.141398+00:00 host kernel - - [<ffffffff811e3502>] SyS_ioctl+0x92/0xa0 <4>1 2016-10-04T00:35:42.141400+00:00 host kernel - - [<ffffffff81003615>] ? syscall_return_slowpath+0x65/0x70 <4>1 2016-10-04T00:35:42.141401+00:00 host kernel - - [<ffffffff816856ee>] entry_SYSCALL_64_fastpath+0x12/0x71
Thanks
--Glenn

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
https://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.