[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] io hang with lvm on md raid1



HI there

Seeing an issue across multiple hosts when copying content from a lvm snapshot. The cp command appears to hang indefinitely and can not be killed (state D). Other commands that require IO (eg lvs, dd, touch $file) may get queued, eventually depending on how busy the host is but normally within 48 hours all io on the host blocks and the host becomes completely unresponsive.

Machines are all stock Centos6 using xen packages from xen.crc.id.au. I also have a report posted on https://xen.crc.id.au/bugs/view.php?id=75

IO stack is always spindle drives -> md raid 1, lvm, lv. In this case the cp target was a sparse image on a separate raid1 drive array (on different drives), but that varies between incidents

We also have raid6 on some hosts, but to date have not seen this issue occur on that which is suggestive of raid1 as the problem.

Although the problem commands are not directly xen related, posting this here first before asking other subsystem lists due to xen hypercalls showing up in the traces below.

Has anyone seen anything like this recently? Or have any insight as to what might be causing this? Or perhaps suggest some ways I might debug this to provide further useful details?

Output from host of "echo w > /proc/sysrq-trigger". We also have 't' output if needed.

<6>1 2016-10-04T00:35:42.140955+00:00 host kernel - - sysrq: SysRq : Show Blocked State <6>1 2016-10-04T00:35:42.140969+00:00 host kernel - - task PC stack pid father <6>1 2016-10-04T00:35:42.140971+00:00 host kernel - - dmeventd D ffff880051cd3a38 0 24754 1 0x00000000 <4>1 2016-10-04T00:35:42.140973+00:00 host kernel - - ffff880051cd3a38 ffff88005caf8140 ffff8800021fa5c0 0000000000000000 <4>1 2016-10-04T00:35:42.140975+00:00 host kernel - - ffff88006424a210 0000000000000000 ffff880051cd39f0 ffffffff810061a9 <4>1 2016-10-04T00:35:42.140976+00:00 host kernel - - ffff88006424a210 ffff88006424aa10 ffff88006424a210 ffff88006424aa10
<4>1 2016-10-04T00:35:42.140977+00:00 host kernel - - Call Trace:
<4>1 2016-10-04T00:35:42.140979+00:00 host kernel - - [<ffffffff810061a9>] ? xen_load_sp0+0xc9/0x1d0 <4>1 2016-10-04T00:35:42.140982+00:00 host kernel - - [<ffffffff81006cbd>] ? xen_mc_flush+0xad/0x1b0 <4>1 2016-10-04T00:35:42.140985+00:00 host kernel - - [<ffffffff810a0ac4>] ? finish_task_switch+0xa4/0x240 <4>1 2016-10-04T00:35:42.140986+00:00 host kernel - - [<ffffffff81681e20>] schedule+0x40/0x90 <4>1 2016-10-04T00:35:42.140987+00:00 host kernel - - [<ffffffff816845ad>] rwsem_down_write_failed+0x1fd/0x360 <4>1 2016-10-04T00:35:42.140988+00:00 host kernel - - [<ffffffff81681586>] ? __schedule+0x306/0xa30 <4>1 2016-10-04T00:35:42.140990+00:00 host kernel - - [<ffffffff8133ff53>] call_rwsem_down_write_failed+0x13/0x20 <4>1 2016-10-04T00:35:42.140991+00:00 host kernel - - [<ffffffff81683d84>] ? down_write+0x24/0x40 <4>1 2016-10-04T00:35:42.140992+00:00 host kernel - - [<ffffffffa02ef25f>] snapshot_status+0x2f/0x190 [dm_snapshot] <4>1 2016-10-04T00:35:42.140994+00:00 host kernel - - [<ffffffffa0009193>] retrieve_status+0xb3/0x1d0 [dm_mod] <4>1 2016-10-04T00:35:42.140995+00:00 host kernel - - [<ffffffffa00092b0>] ? retrieve_status+0x1d0/0x1d0 [dm_mod] <4>1 2016-10-04T00:35:42.140996+00:00 host kernel - - [<ffffffffa000930e>] table_status+0x5e/0x90 [dm_mod] <4>1 2016-10-04T00:35:42.140997+00:00 host kernel - - [<ffffffffa000aa03>] ctl_ioctl+0x1d3/0x410 [dm_mod] <4>1 2016-10-04T00:35:42.140999+00:00 host kernel - - [<ffffffffa000ac53>] dm_ctl_ioctl+0x13/0x20 [dm_mod] <4>1 2016-10-04T00:35:42.141000+00:00 host kernel - - [<ffffffff811e3048>] do_vfs_ioctl+0x88/0x4b0 <4>1 2016-10-04T00:35:42.141001+00:00 host kernel - - [<ffffffff811ec92c>] ? __fget_light+0x2c/0x70 <4>1 2016-10-04T00:35:42.141002+00:00 host kernel - - [<ffffffff810882d5>] ? __set_current_blocked+0x55/0x60 <4>1 2016-10-04T00:35:42.141004+00:00 host kernel - - [<ffffffff811e3502>] SyS_ioctl+0x92/0xa0 <4>1 2016-10-04T00:35:42.141005+00:00 host kernel - - [<ffffffff81003615>] ? syscall_return_slowpath+0x65/0x70 <4>1 2016-10-04T00:35:42.141006+00:00 host kernel - - [<ffffffff816856ee>] entry_SYSCALL_64_fastpath+0x12/0x71

<6>1 2016-10-04T00:35:42.141008+00:00 host kernel - - kworker/1:2 D ffff8800324139f8 0 23596 2 0x00000000 <6>1 2016-10-04T00:35:42.141009+00:00 host kernel - - Workqueue: kcopyd do_work [dm_mod] <4>1 2016-10-04T00:35:42.141010+00:00 host kernel - - ffff8800324139f8 ffff88005caf8140 ffff88005b576180 ffffffff816879c9 <4>1 2016-10-04T00:35:42.141012+00:00 host kernel - - 0000000000000001 0000000000000001 ffff88006424da80 00000001aa7d9a84 <4>1 2016-10-04T00:35:42.141013+00:00 host kernel - - ffff88006424da80 ffffffff816879c9 0000000000000001 0000000000000001
<4>1 2016-10-04T00:35:42.141014+00:00 host kernel - - Call Trace:
<4>1 2016-10-04T00:35:42.141015+00:00 host kernel - - [<ffffffff816879c9>] ? error_exit+0x9/0x20 <4>1 2016-10-04T00:35:42.141016+00:00 host kernel - - [<ffffffff816879c9>] ? error_exit+0x9/0x20 <4>1 2016-10-04T00:35:42.141017+00:00 host kernel - - [<ffffffff816853cf>] ? _raw_spin_lock_irqsave+0x1f/0x50 <4>1 2016-10-04T00:35:42.141018+00:00 host kernel - - [<ffffffff810d83ea>] ? lock_timer_base+0x5a/0x80 <4>1 2016-10-04T00:35:42.141019+00:00 host kernel - - [<ffffffff816850f6>] ? _raw_spin_unlock_irqrestore+0x16/0x20 <4>1 2016-10-04T00:35:42.141020+00:00 host kernel - - [<ffffffff816853cf>] ? _raw_spin_lock_irqsave+0x1f/0x50 <4>1 2016-10-04T00:35:42.141022+00:00 host kernel - - [<ffffffff81681e20>] schedule+0x40/0x90 <4>1 2016-10-04T00:35:42.141023+00:00 host kernel - - [<ffffffff816848a8>] schedule_timeout+0x118/0x1f0 <4>1 2016-10-04T00:35:42.141024+00:00 host kernel - - [<ffffffff810d8410>] ? lock_timer_base+0x80/0x80 <4>1 2016-10-04T00:35:42.141025+00:00 host kernel - - [<ffffffff8100d05d>] ? xen_force_evtchn_callback+0xd/0x10 <4>1 2016-10-04T00:35:42.141026+00:00 host kernel - - [<ffffffff8100d822>] ? check_events+0x12/0x20 <4>1 2016-10-04T00:35:42.141027+00:00 host kernel - - [<ffffffff8168499e>] schedule_timeout_uninterruptible+0x1e/0x20 <4>1 2016-10-04T00:35:42.141029+00:00 host kernel - - [<ffffffff810da4cc>] msleep+0x1c/0x30 <4>1 2016-10-04T00:35:42.141030+00:00 host kernel - - [<ffffffffa02eff22>] __check_for_conflicting_io+0x62/0x80 [dm_snapshot] <4>1 2016-10-04T00:35:42.141031+00:00 host kernel - - [<ffffffffa02f0171>] pending_complete+0x231/0x270 [dm_snapshot] <4>1 2016-10-04T00:35:42.141032+00:00 host kernel - - [<ffffffffa02f1f60>] persistent_commit_exception+0xb0/0x140 [dm_snapshot] <4>1 2016-10-04T00:35:42.141034+00:00 host kernel - - [<ffffffffa02ee12c>] copy_callback+0x10c/0x130 [dm_snapshot] <4>1 2016-10-04T00:35:42.141035+00:00 host kernel - - [<ffffffff8115ce43>] ? mempool_free+0x33/0x90 <4>1 2016-10-04T00:35:42.141036+00:00 host kernel - - [<ffffffffa02ee020>] ? dm_snap_cow+0x10/0x10 [dm_snapshot] <4>1 2016-10-04T00:35:42.141037+00:00 host kernel - - [<ffffffffa000c0f5>] run_complete_job+0x95/0xe0 [dm_mod] <4>1 2016-10-04T00:35:42.141038+00:00 host kernel - - [<ffffffffa000bd56>] process_jobs+0x76/0x110 [dm_mod] <4>1 2016-10-04T00:35:42.141039+00:00 host kernel - - [<ffffffffa000c060>] ? dispatch_job+0x70/0x70 [dm_mod] <4>1 2016-10-04T00:35:42.141040+00:00 host kernel - - [<ffffffffa000be2f>] do_work+0x3f/0x80 [dm_mod] <4>1 2016-10-04T00:35:42.141041+00:00 host kernel - - [<ffffffff81093bc5>] process_one_work+0x165/0x500 <4>1 2016-10-04T00:35:42.141042+00:00 host kernel - - [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20 <4>1 2016-10-04T00:35:42.141044+00:00 host kernel - - [<ffffffff81094993>] worker_thread+0x133/0x610 <4>1 2016-10-04T00:35:42.141045+00:00 host kernel - - [<ffffffff810a2dc2>] ? default_wake_function+0x12/0x20 <4>1 2016-10-04T00:35:42.141046+00:00 host kernel - - [<ffffffff810b6546>] ? __wake_up_common+0x56/0x90 <4>1 2016-10-04T00:35:42.141047+00:00 host kernel - - [<ffffffff81094860>] ? create_worker+0x1e0/0x1e0 <4>1 2016-10-04T00:35:42.141048+00:00 host kernel - - [<ffffffff81094860>] ? create_worker+0x1e0/0x1e0 <4>1 2016-10-04T00:35:42.141049+00:00 host kernel - - [<ffffffff810993ac>] kthread+0xcc/0xf0 <4>1 2016-10-04T00:35:42.141050+00:00 host kernel - - [<ffffffff810a1b1e>] ? schedule_tail+0x1e/0xc0 <4>1 2016-10-04T00:35:42.141052+00:00 host kernel - - [<ffffffff810992e0>] ? kthread_freezable_should_stop+0x70/0x70 <4>1 2016-10-04T00:35:42.141053+00:00 host kernel - - [<ffffffff81685a4f>] ret_from_fork+0x3f/0x70 <4>1 2016-10-04T00:35:42.141054+00:00 host kernel - - [<ffffffff810992e0>] ? kthread_freezable_should_stop+0x70/0x70

<6>1 2016-10-04T00:35:42.141055+00:00 host kernel - - cp D ffff88001b88b5e8 0 24931 20597 0x00000004 <4>1 2016-10-04T00:35:42.141056+00:00 host kernel - - ffff88001b88b5e8 ffffffff818b9500 ffff8800021f2480 ffff88005fc01340 <4>1 2016-10-04T00:35:42.141058+00:00 host kernel - - ffff88007d3ddd80 ffff88007d3d0250 0000000002420440 ffff88001b88b6e8 <4>1 2016-10-04T00:35:42.141087+00:00 host kernel - - ffff88007d3dd6c0 ffff88005cafa600 ffff880059cfa560 ffff8800571cd009
<4>1 2016-10-04T00:35:42.141089+00:00 host kernel - - Call Trace:
<4>1 2016-10-04T00:35:42.141090+00:00 host kernel - - [<ffffffff81336cad>] ? radix_tree_lookup+0xd/0x10 <4>1 2016-10-04T00:35:42.141091+00:00 host kernel - - [<ffffffff810c8577>] ? irq_to_desc+0x17/0x20 <4>1 2016-10-04T00:35:42.141093+00:00 host kernel - - [<ffffffff810cbd0e>] ? irq_get_irq_data+0xe/0x20 <4>1 2016-10-04T00:35:42.141094+00:00 host kernel - - [<ffffffff81162123>] ? __rmqueue+0x2f3/0x480 <4>1 2016-10-04T00:35:42.141095+00:00 host kernel - - [<ffffffff813f7433>] ? xen_send_IPI_one+0x33/0x60 <4>1 2016-10-04T00:35:42.141097+00:00 host kernel - - [<ffffffff81681e20>] schedule+0x40/0x90 <4>1 2016-10-04T00:35:42.141098+00:00 host kernel - - [<ffffffff816845ad>] rwsem_down_write_failed+0x1fd/0x360 <4>1 2016-10-04T00:35:42.141099+00:00 host kernel - - [<ffffffff811b58d2>] ? kmem_cache_alloc+0x72/0x160 <4>1 2016-10-04T00:35:42.141101+00:00 host kernel - - [<ffffffff8133ff53>] call_rwsem_down_write_failed+0x13/0x20 <4>1 2016-10-04T00:35:42.141102+00:00 host kernel - - [<ffffffff81683d84>] ? down_write+0x24/0x40 <4>1 2016-10-04T00:35:42.141104+00:00 host kernel - - [<ffffffffa02f0c20>] snapshot_map+0x90/0x490 [dm_snapshot] <4>1 2016-10-04T00:35:42.141105+00:00 host kernel - - [<ffffffffa000427a>] __map_bio+0x4a/0x130 [dm_mod] <4>1 2016-10-04T00:35:42.141106+00:00 host kernel - - [<ffffffffa0004867>] __split_and_process_bio+0x327/0x3f0 [dm_mod] <4>1 2016-10-04T00:35:42.141108+00:00 host kernel - - [<ffffffffa00049a4>] dm_make_request+0x74/0xe0 [dm_mod] <4>1 2016-10-04T00:35:42.141109+00:00 host kernel - - [<ffffffff8130922f>] generic_make_request+0xff/0x1d0 <4>1 2016-10-04T00:35:42.141111+00:00 host kernel - - [<ffffffff81309370>] submit_bio+0x70/0x140 <4>1 2016-10-04T00:35:42.141112+00:00 host kernel - - [<ffffffff8120eed4>] mpage_bio_submit+0x34/0x50 <4>1 2016-10-04T00:35:42.141113+00:00 host kernel - - [<ffffffff8120f2c3>] do_mpage_readpage+0x2b3/0x6d0 <4>1 2016-10-04T00:35:42.141115+00:00 host kernel - - [<ffffffff8116b34f>] ? __lru_cache_add+0x5f/0x80 <4>1 2016-10-04T00:35:42.141116+00:00 host kernel - - [<ffffffff8116b37e>] ? lru_cache_add+0xe/0x10 <4>1 2016-10-04T00:35:42.141118+00:00 host kernel - - [<ffffffff8120f874>] mpage_readpages+0x114/0x160 <4>1 2016-10-04T00:35:42.141119+00:00 host kernel - - [<ffffffff812096f0>] ? I_BDEV+0x20/0x20 <4>1 2016-10-04T00:35:42.141120+00:00 host kernel - - [<ffffffff812096f0>] ? I_BDEV+0x20/0x20 <4>1 2016-10-04T00:35:42.141121+00:00 host kernel - - [<ffffffff81208202>] ? block_write_end+0x42/0x90 <4>1 2016-10-04T00:35:42.141123+00:00 host kernel - - [<ffffffff8115ad6b>] ? __page_cache_alloc+0xcb/0x110 <4>1 2016-10-04T00:35:42.141124+00:00 host kernel - - [<ffffffff8154d56c>] ? mddev_congested+0x2c/0x40 <4>1 2016-10-04T00:35:42.141125+00:00 host kernel - - [<ffffffff81209fdd>] blkdev_readpages+0x1d/0x20 <4>1 2016-10-04T00:35:42.141127+00:00 host kernel - - [<ffffffff81168d60>] __do_page_cache_readahead+0x1a0/0x240 <4>1 2016-10-04T00:35:42.141128+00:00 host kernel - - [<ffffffff81168f4d>] ondemand_readahead+0x14d/0x250 <4>1 2016-10-04T00:35:42.141130+00:00 host kernel - - [<ffffffff811f8fa4>] ? inode_congested+0xa4/0x100 <4>1 2016-10-04T00:35:42.141131+00:00 host kernel - - [<ffffffff811690c2>] page_cache_async_readahead+0x72/0x80 <4>1 2016-10-04T00:35:42.141132+00:00 host kernel - - [<ffffffff8115c62e>] generic_file_read_iter+0x40e/0x5e0 <4>1 2016-10-04T00:35:42.141134+00:00 host kernel - - [<ffffffff81209ac7>] blkdev_read_iter+0x37/0x40 <4>1 2016-10-04T00:35:42.141135+00:00 host kernel - - [<ffffffff811d0f7c>] __vfs_read+0xcc/0xf0 <4>1 2016-10-04T00:35:42.141136+00:00 host kernel - - [<ffffffff811d122e>] vfs_read+0x8e/0xe0 <4>1 2016-10-04T00:35:42.141138+00:00 host kernel - - [<ffffffff811ecdb2>] ? __fdget_pos+0x12/0x50 <4>1 2016-10-04T00:35:42.141139+00:00 host kernel - - [<ffffffff811d1ae6>] SyS_read+0x56/0xc0 <4>1 2016-10-04T00:35:42.141140+00:00 host kernel - - [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20 <4>1 2016-10-04T00:35:42.141142+00:00 host kernel - - [<ffffffff816856ee>] entry_SYSCALL_64_fastpath+0x12/0x71 <4>1 2016-10-04T00:35:42.141143+00:00 host kernel - - [<ffffffff8100d05d>] ? xen_force_evtchn_callback+0xd/0x10

<6>1 2016-10-04T00:35:42.141144+00:00 host kernel - - lvs D ffff88004bd2fa38 0 22800 22771 0x00000004 <4>1 2016-10-04T00:35:42.141145+00:00 host kernel - - ffff88004bd2fa38 ffff88005caf8140 ffff88002f7e4640 0000000000000001 <4>1 2016-10-04T00:35:42.141146+00:00 host kernel - - ffff88007d3ddb50 000001004bd2c008 ffff88007d3ddb48 ffffffffffffffff <4>1 2016-10-04T00:35:42.141147+00:00 host kernel - - ffff88007d3ddc48 00ff88007d3ddd80 ffff88007d3deb10 0000000000000000
<4>1 2016-10-04T00:35:42.141148+00:00 host kernel - - Call Trace:
<4>1 2016-10-04T00:35:42.141149+00:00 host kernel - - [<ffffffff811634df>] ? __alloc_pages_nodemask+0x17f/0xbc0 <4>1 2016-10-04T00:35:42.141151+00:00 host kernel - - [<ffffffff810b6a7f>] ? __wake_up_sync_key+0x5f/0x80 <4>1 2016-10-04T00:35:42.141152+00:00 host kernel - - [<ffffffff811ea0f9>] ? address_space_init_once+0x39/0x70 <4>1 2016-10-04T00:35:42.141153+00:00 host kernel - - [<ffffffff811ea196>] ? inode_init_once+0x66/0x80 <4>1 2016-10-04T00:35:42.141154+00:00 host kernel - - [<ffffffff81681e20>] schedule+0x40/0x90 <4>1 2016-10-04T00:35:42.141155+00:00 host kernel - - [<ffffffff816845ad>] rwsem_down_write_failed+0x1fd/0x360 <4>1 2016-10-04T00:35:42.141157+00:00 host kernel - - [<ffffffff811ead01>] ? inode_sb_list_add+0x21/0x70 <4>1 2016-10-04T00:35:42.141158+00:00 host kernel - - [<ffffffff810b6bef>] ? wake_up_bit+0x2f/0x40 <4>1 2016-10-04T00:35:42.141159+00:00 host kernel - - [<ffffffff811e720d>] ? d_rehash+0x4d/0x60 <4>1 2016-10-04T00:35:42.141161+00:00 host kernel - - [<ffffffff8133ff53>] call_rwsem_down_write_failed+0x13/0x20 <4>1 2016-10-04T00:35:42.141162+00:00 host kernel - - [<ffffffff81683d84>] ? down_write+0x24/0x40 <4>1 2016-10-04T00:35:42.141163+00:00 host kernel - - [<ffffffffa02ef25f>] snapshot_status+0x2f/0x190 [dm_snapshot] <4>1 2016-10-04T00:35:42.141165+00:00 host kernel - - [<ffffffffa0009193>] retrieve_status+0xb3/0x1d0 [dm_mod] <4>1 2016-10-04T00:35:42.141166+00:00 host kernel - - [<ffffffffa00092b0>] ? retrieve_status+0x1d0/0x1d0 [dm_mod] <4>1 2016-10-04T00:35:42.141168+00:00 host kernel - - [<ffffffffa000930e>] table_status+0x5e/0x90 [dm_mod] <4>1 2016-10-04T00:35:42.141169+00:00 host kernel - - [<ffffffffa000aa03>] ctl_ioctl+0x1d3/0x410 [dm_mod] <4>1 2016-10-04T00:35:42.141391+00:00 host kernel - - [<ffffffff811d2f59>] ? __fput+0x149/0x1f0 <4>1 2016-10-04T00:35:42.141394+00:00 host kernel - - [<ffffffffa000ac53>] dm_ctl_ioctl+0x13/0x20 [dm_mod] <4>1 2016-10-04T00:35:42.141395+00:00 host kernel - - [<ffffffff811e3048>] do_vfs_ioctl+0x88/0x4b0 <4>1 2016-10-04T00:35:42.141397+00:00 host kernel - - [<ffffffff8100325b>] ? exit_to_usermode_loop+0x7b/0xd0 <4>1 2016-10-04T00:35:42.141398+00:00 host kernel - - [<ffffffff811e3502>] SyS_ioctl+0x92/0xa0 <4>1 2016-10-04T00:35:42.141400+00:00 host kernel - - [<ffffffff81003615>] ? syscall_return_slowpath+0x65/0x70 <4>1 2016-10-04T00:35:42.141401+00:00 host kernel - - [<ffffffff816856ee>] entry_SYSCALL_64_fastpath+0x12/0x71

Thanks
--Glenn

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
https://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.