[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-bugs] [Bug 1659] New: Dom0 'looses' BIOs
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1659 Summary: Dom0 'looses' BIOs Product: Xen Version: unspecified Platform: x86-64 OS/Version: Linux-2.6 Status: NEW Severity: major Priority: P2 Component: Unspecified AssignedTo: xen-bugs@xxxxxxxxxxxxxxxxxxx ReportedBy: xenbugsx6vp3m@xxxxxxxxx DOM0 Setup: blkback -> DRBD -> LVM2 -> MD RAID1 -> SATA Symptom: the md raid1 device hangs during raid resync (resync hangs, accesses to the md raid1 device are hanging, accesses to the underlying SATA devices are ok). There is a deadlock in the *_barrier functions of raid1.c. The resync process is waiting for a pending request to finish (but which either never finishes or at least 'forgets' to decrease the pending count related to the resync barrier handling in raid1.c. While the resync process waits for pending regular I/O to complete, it has already risen the resync barrier and all further normal I/O is therefore waiting for the resync op to lower its barrier. (see call trace below) The bug has been tested and verified on totally different x86-64 platforms (AMD Opterion 1214HE + MCP55 chipset, Intel Core2Duo Notebook ICH9M Chipset), so it is unlikely to be a hardware issue. It has been verified using OpenSUSE 11.2 (2.6.31.12-0.2-xen) and 11.3 (2.6.34) dom0 kernels, running on xen hypervisor 3.4.1, 3.4.2 and 3.4.3. I could not reproduce the bug with kernels without any xen dom0 patches. The situation seems to occur preferably when crashing the hardware node and the VMs therefore start a file system journal replay (ext3). That is also the potential reason why I could not reproduce the bug with regular kernels (ie. without xen patches) -- I have no definitive clue whether this is a xen-specific problem. [ 603.229215] INFO: task md1_resync:1441 blocked for more than 120 seconds. [ 603.229294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 603.229413] md1_resync D 0000000000000000 0 1441 2 0x00000000 [ 603.229505] ffff88003d967bb0 0000000000000246 ffff88003d967b10 ffff88003d967b30 [ 603.229627] 0000000000000000 ffff88003d967b78 000000000000a380 ffff88003dba8be8 [ 603.229753] 000000000000a380 000000000000a380 000000000000a380 0000000000007d00 [ 603.229880] Call Trace: [ 603.229964] [<ffffffffa002c63e>] raise_barrier+0xde/0x2e0 [raid1] [ 603.230037] [<ffffffffa002d5cb>] sync_request+0x12b/0x680 [raid1] [ 603.230112] [<ffffffff80399de9>] md_do_sync+0x669/0xc40 [ 603.230180] [<ffffffff8039ac54>] md_thread+0x54/0x150 [ 603.230249] [<ffffffff8006fac6>] kthread+0xb6/0xc0 [ 603.230318] [<ffffffff8000d38a>] child_rip+0xa/0x20 [ 603.230401] INFO: task python:5365 blocked for more than 120 seconds. [ 603.230467] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 603.230584] python D 00000000c7c9e55f 0 5365 5348 0x00000000 [ 603.230657] ffff8800382595b8 0000000000000282 ffff880038259518 ffff880038259538 [ 603.230783] ffff8800382594e8 ffff880038259580 000000000000a380 ffff8800381e88e8 [ 603.230909] 000000000000a380 000000000000a380 000000000000a380 0000000000007d00 [ 603.231035] Call Trace: [ 603.231098] [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1] [ 603.231169] [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1] [ 603.231239] [<ffffffff80399498>] md_make_request+0xc8/0x140 [ 603.231309] [<ffffffff802224db>] generic_make_request+0x19b/0x4c0 [ 603.231380] [<ffffffff8022287d>] submit_bio+0x7d/0x110 [ 603.231448] [<ffffffff80153535>] mpage_bio_submit+0x35/0x50 [ 603.231517] [<ffffffff80153aa3>] do_mpage_readpage+0x383/0x710 [ 603.231595] [<ffffffff80153fb3>] mpage_readpages+0xf3/0x150 [ 603.231664] [<ffffffff801b8ccb>] ext2_readpages+0x2b/0x50 [ 603.231733] [<ffffffff800e0353>] read_pages+0x43/0x110 [ 603.231801] [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0 [ 603.231871] [<ffffffff800e05ff>] ra_submit+0x2f/0x50 [ 603.231936] [<ffffffff800e087d>] ondemand_readahead+0x11d/0x260 [ 603.232005] [<ffffffff800e0a60>] page_cache_async_readahead+0xa0/0xc0 [ 603.232082] [<ffffffff800d7311>] T.731+0x1f1/0x440 [ 603.232149] [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0 [ 603.232218] [<ffffffff80118da2>] do_sync_read+0x102/0x160 [ 603.232286] [<ffffffff801192d5>] vfs_read+0xd5/0x1c0 [ 603.232352] [<ffffffff801199fb>] sys_read+0x5b/0xa0 [ 603.233212] [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b [ 603.233279] [<00007fab60821a90>] 0x7fab60821a90 [ 603.233340] INFO: task blkid:5393 blocked for more than 120 seconds. [ 603.233402] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 603.233508] blkid D 0000000000000001 0 5393 1 0x00000000 [ 603.233573] ffff8800406d96e8 0000000000000282 ffff8800406d9648 ffff8800406d9668 [ 603.233685] ffff8800406d9618 ffff8800406d96b0 000000000000a380 ffff8800382fe4a8 [ 603.233798] 000000000000a380 000000000000a380 000000000000a380 0000000000007d00 [ 603.233914] Call Trace: [ 603.233969] [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1] [ 603.234037] [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1] [ 603.234106] [<ffffffff80399498>] md_make_request+0xc8/0x140 [ 603.234176] [<ffffffff802224db>] generic_make_request+0x19b/0x4c0 [ 603.234244] [<ffffffff8022287d>] submit_bio+0x7d/0x110 [ 603.234250] [<ffffffff80147c12>] submit_bh+0x102/0x150 [ 603.234257] [<ffffffff8014adac>] block_read_full_page+0x23c/0x3b0 [ 603.234262] [<ffffffff801509b6>] blkdev_readpage+0x26/0x50 [ 603.234268] [<ffffffff800e03f6>] read_pages+0xe6/0x110 [ 603.234273] [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0 [ 603.234278] [<ffffffff800e0839>] ondemand_readahead+0xd9/0x260 [ 603.234284] [<ffffffff800e0aad>] page_cache_sync_readahead+0x2d/0x50 [ 603.234288] [<ffffffff800d73d6>] T.731+0x2b6/0x440 [ 603.234293] [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0 [ 603.234300] [<ffffffff80118da2>] do_sync_read+0x102/0x160 [ 603.234305] [<ffffffff801192d5>] vfs_read+0xd5/0x1c0 [ 603.234310] [<ffffffff801199fb>] sys_read+0x5b/0xa0 [ 603.234315] [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b [ 603.234320] [<00007fbbf8f1ea90>] 0x7fbbf8f1ea90 [ 603.234323] INFO: task blkid:5397 blocked for more than 120 seconds. [ 603.234324] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 603.234326] blkid D 0000000098d5b5f5 0 5397 1 0x00000000 [ 603.234330] ffff8800381656c8 0000000000000286 ffff880038165628 ffff880038165648 [ 603.234333] 0000000000000000 ffff880038165690 000000000000a380 ffff8800382f8be8 [ 603.234336] 000000000000a380 000000000000a380 000000000000a380 0000000000007d00 [ 603.234339] Call Trace: [ 603.234345] [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1] [ 603.234351] [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1] [ 603.234358] [<ffffffff80399498>] md_make_request+0xc8/0x140 [ 603.234363] [<ffffffff802224db>] generic_make_request+0x19b/0x4c0 [ 603.234369] [<ffffffff8022287d>] submit_bio+0x7d/0x110 [ 603.234373] [<ffffffff80147c12>] submit_bh+0x102/0x150 [ 603.234379] [<ffffffff8014adac>] block_read_full_page+0x23c/0x3b0 [ 603.234383] [<ffffffff801509b6>] blkdev_readpage+0x26/0x50 [ 603.234388] [<ffffffff800e03f6>] read_pages+0xe6/0x110 [ 603.234393] [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0 [ 603.234398] [<ffffffff800e05ff>] ra_submit+0x2f/0x50 [ 603.234403] [<ffffffff800e087d>] ondemand_readahead+0x11d/0x260 [ 603.234408] [<ffffffff800e0aad>] page_cache_sync_readahead+0x2d/0x50 [ 603.234412] [<ffffffff800d73d6>] T.731+0x2b6/0x440 [ 603.234417] [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0 [ 603.234422] [<ffffffff80118da2>] do_sync_read+0x102/0x160 [ 603.234427] [<ffffffff801192d5>] vfs_read+0xd5/0x1c0 [ 603.234432] [<ffffffff801199fb>] sys_read+0x5b/0xa0 [ 603.234437] [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b [ 603.234442] [<00007fb60ac89a90>] 0x7fb60ac89a90 [ 723.225805] INFO: task md1_resync:1441 blocked for more than 120 seconds. [ 723.225892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 723.226010] md1_resync D 0000000000000000 0 1441 2 0x00000000 [ 723.226108] ffff88003d967bb0 0000000000000246 ffff88003d967b10 ffff88003d967b30 [ 723.226239] 0000000000000000 ffff88003d967b78 000000000000a380 ffff88003dba8be8 [ 723.226372] 000000000000a380 000000000000a380 000000000000a380 0000000000007d00 [ 723.226504] Call Trace: [ 723.226600] [<ffffffffa002c63e>] raise_barrier+0xde/0x2e0 [raid1] [ 723.226684] [<ffffffffa002d5cb>] sync_request+0x12b/0x680 [raid1] [ 723.226766] [<ffffffff80399de9>] md_do_sync+0x669/0xc40 [ 723.226841] [<ffffffff8039ac54>] md_thread+0x54/0x150 [ 723.226913] [<ffffffff8006fac6>] kthread+0xb6/0xc0 [ 723.226987] [<ffffffff8000d38a>] child_rip+0xa/0x20 [ 723.227078] INFO: task python:5365 blocked for more than 120 seconds. [ 723.227147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 723.227265] python D 00000000c7c9e55f 0 5365 5348 0x00000000 [ 723.227346] ffff8800382595b8 0000000000000282 ffff880038259518 ffff880038259538 [ 723.227482] ffff8800382594e8 ffff880038259580 000000000000a380 ffff8800381e88e8 [ 723.227616] 000000000000a380 000000000000a380 000000000000a380 0000000000007d00 [ 723.227750] Call Trace: [ 723.227820] [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1] [ 723.227904] [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1] [ 723.227983] [<ffffffff80399498>] md_make_request+0xc8/0x140 [ 723.228059] [<ffffffff802224db>] generic_make_request+0x19b/0x4c0 [ 723.228135] [<ffffffff8022287d>] submit_bio+0x7d/0x110 [ 723.228209] [<ffffffff80153535>] mpage_bio_submit+0x35/0x50 [ 723.228284] [<ffffffff80153aa3>] do_mpage_readpage+0x383/0x710 [ 723.228362] [<ffffffff80153fb3>] mpage_readpages+0xf3/0x150 [ 723.228437] [<ffffffff801b8ccb>] ext2_readpages+0x2b/0x50 [ 723.228512] [<ffffffff800e0353>] read_pages+0x43/0x110 [ 723.228586] [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0 [ 723.228667] [<ffffffff800e05ff>] ra_submit+0x2f/0x50 [ 723.228746] [<ffffffff800e087d>] ondemand_readahead+0x11d/0x260 [ 723.228828] [<ffffffff800e0a60>] page_cache_async_readahead+0xa0/0xc0 [ 723.228906] [<ffffffff800d7311>] T.731+0x1f1/0x440 [ 723.228978] [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0 [ 723.229059] [<ffffffff80118da2>] do_sync_read+0x102/0x160 [ 723.229133] [<ffffffff801192d5>] vfs_read+0xd5/0x1c0 [ 723.229209] [<ffffffff801199fb>] sys_read+0x5b/0xa0 [ 723.229285] [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b [ 723.229362] [<00007fab60821a90>] 0x7fab60821a90 [ 723.229431] INFO: task blkid:5393 blocked for more than 120 seconds. [ 723.229500] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 723.229619] blkid D 0000000000000001 0 5393 1 0x00000000 [ 723.229700] ffff8800406d96e8 0000000000000282 ffff8800406d9648 ffff8800406d9668 [ 723.229857] ffff8800406d9618 ffff8800406d96b0 000000000000a380 ffff8800382fe4a8 [ 723.229998] 000000000000a380 000000000000a380 000000000000a380 0000000000007d00 [ 723.230083] Call Trace: [ 723.230097] [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1] [ 723.230109] [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1] [ 723.230126] [<ffffffff80399498>] md_make_request+0xc8/0x140 [ 723.230142] [<ffffffff802224db>] generic_make_request+0x19b/0x4c0 [ 723.230152] [<ffffffff8022287d>] submit_bio+0x7d/0x110 [ 723.230162] [<ffffffff80147c12>] submit_bh+0x102/0x150 [ 723.230173] [<ffffffff8014adac>] block_read_full_page+0x23c/0x3b0 [ 723.230184] [<ffffffff801509b6>] blkdev_readpage+0x26/0x50 [ 723.230193] [<ffffffff800e03f6>] read_pages+0xe6/0x110 [ 723.230203] [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0 [ 723.230213] [<ffffffff800e0839>] ondemand_readahead+0xd9/0x260 [ 723.230223] [<ffffffff800e0aad>] page_cache_sync_readahead+0x2d/0x50 [ 723.230235] [<ffffffff800d73d6>] T.731+0x2b6/0x440 [ 723.230244] [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0 [ 723.230254] [<ffffffff80118da2>] do_sync_read+0x102/0x160 [ 723.230263] [<ffffffff801192d5>] vfs_read+0xd5/0x1c0 [ 723.230278] [<ffffffff801199fb>] sys_read+0x5b/0xa0 [ 723.230288] [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b [ 723.230298] [<00007fbbf8f1ea90>] 0x7fbbf8f1ea90 [ 723.230303] INFO: task blkid:5397 blocked for more than 120 seconds. [ 723.230306] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 723.230310] blkid D 0000000098d5b5f5 0 5397 1 0x00000000 [ 723.230316] ffff8800381656c8 0000000000000286 ffff880038165628 ffff880038165648 [ 723.230322] 0000000000000000 ffff880038165690 000000000000a380 ffff8800382f8be8 [ 723.230327] 000000000000a380 000000000000a380 000000000000a380 0000000000007d00 [ 723.230333] Call Trace: [ 723.230344] [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1] [ 723.230355] [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1] [ 723.230381] [<ffffffff80399498>] md_make_request+0xc8/0x140 [ 723.230391] [<ffffffff802224db>] generic_make_request+0x19b/0x4c0 [ 723.230401] [<ffffffff8022287d>] submit_bio+0x7d/0x110 [ 723.230410] [<ffffffff80147c12>] submit_bh+0x102/0x150 [ 723.230423] [<ffffffff8014adac>] block_read_full_page+0x23c/0x3b0 [ 723.230434] [<ffffffff801509b6>] blkdev_readpage+0x26/0x50 [ 723.230443] [<ffffffff800e03f6>] read_pages+0xe6/0x110 [ 723.230453] [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0 [ 723.230463] [<ffffffff800e05ff>] ra_submit+0x2f/0x50 [ 723.230475] [<ffffffff800e087d>] ondemand_readahead+0x11d/0x260 [ 723.230487] [<ffffffff800e0aad>] page_cache_sync_readahead+0x2d/0x50 [ 723.230495] [<ffffffff800d73d6>] T.731+0x2b6/0x440 [ 723.230504] [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0 [ 723.230514] [<ffffffff80118da2>] do_sync_read+0x102/0x160 [ 723.230523] [<ffffffff801192d5>] vfs_read+0xd5/0x1c0 [ 723.230535] [<ffffffff801199fb>] sys_read+0x5b/0xa0 [ 723.230547] [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b [ 723.230556] [<00007fb60ac89a90>] 0x7fb60ac89a90 [ 723.230563] INFO: task lvscan:5451 blocked for more than 120 seconds. [ 723.230566] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 723.230570] lvscan D 000000003a1ec879 0 5451 5449 0x00000000 [ 723.230576] ffff880038207898 0000000000000282 ffff8800382077f8 ffff880038207818 [ 723.230581] ffff8800382077e8 ffff880038207860 000000000000a380 ffff8800407047e8 [ 723.230587] 000000000000a380 000000000000a380 000000000000a380 0000000000007d00 [ 723.230592] Call Trace: [ 723.230606] [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1] [ 723.230618] [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1] [ 723.230631] [<ffffffff80399498>] md_make_request+0xc8/0x140 [ 723.230641] [<ffffffff802224db>] generic_make_request+0x19b/0x4c0 [ 723.230650] [<ffffffff8022287d>] submit_bio+0x7d/0x110 [ 723.230659] [<ffffffff8015210b>] dio_bio_submit+0x6b/0xc0 [ 723.230668] [<ffffffff80152d28>] direct_io_worker+0x258/0x3c0 [ 723.230678] [<ffffffff801530be>] __blockdev_direct_IO+0x22e/0x4d0 [ 723.230687] [<ffffffff80150838>] blkdev_direct_IO+0x58/0x80 [ 723.230695] [<ffffffff800d7737>] generic_file_aio_read+0x1d7/0x1f0 [ 723.230708] [<ffffffff80118da2>] do_sync_read+0x102/0x160 [ 723.230718] [<ffffffff801192d5>] vfs_read+0xd5/0x1c0 [ 723.230727] [<ffffffff801199fb>] sys_read+0x5b/0xa0 [ 723.230736] [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b [ 723.230745] [<00007f78d4b85a80>] 0x7f78d4b85a80 [ 843.222203] INFO: task md1_resync:1441 blocked for more than 120 seconds. [ 843.222290] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 843.222406] md1_resync D 0000000000000000 0 1441 2 0x00000000 [ 843.222508] ffff88003d967bb0 0000000000000246 ffff88003d967b10 ffff88003d967b30 [ 843.222641] 0000000000000000 ffff88003d967b78 000000000000a380 ffff88003dba8be8 [ 843.222777] 000000000000a380 000000000000a380 000000000000a380 0000000000007d00 [ 843.222912] Call Trace: [ 843.223009] [<ffffffffa002c63e>] raise_barrier+0xde/0x2e0 [raid1] [ 843.223091] [<ffffffffa002d5cb>] sync_request+0x12b/0x680 [raid1] [ 843.223175] [<ffffffff80399de9>] md_do_sync+0x669/0xc40 [ 843.223250] [<ffffffff8039ac54>] md_thread+0x54/0x150 [ 843.223325] [<ffffffff8006fac6>] kthread+0xb6/0xc0 [ 843.223400] [<ffffffff8000d38a>] child_rip+0xa/0x20 -- Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. _______________________________________________ Xen-bugs mailing list Xen-bugs@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-bugs
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |