|
|
|
|
|
|
|
|
|
|
xen-bugs
[Xen-bugs] [Bug 1659] New: Dom0 'looses' BIOs
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1659
Summary: Dom0 'looses' BIOs
Product: Xen
Version: unspecified
Platform: x86-64
OS/Version: Linux-2.6
Status: NEW
Severity: major
Priority: P2
Component: Unspecified
AssignedTo: xen-bugs@xxxxxxxxxxxxxxxxxxx
ReportedBy: xenbugsx6vp3m@xxxxxxxxx
DOM0 Setup: blkback -> DRBD -> LVM2 -> MD RAID1 -> SATA
Symptom: the md raid1 device hangs during raid resync (resync hangs, accesses
to the md raid1 device are hanging, accesses to the underlying SATA devices are
ok). There is a deadlock in the *_barrier functions of raid1.c. The resync
process is waiting for a pending request to finish (but which either never
finishes or at least 'forgets' to decrease the pending count related to the
resync barrier handling in raid1.c. While the resync process waits for pending
regular I/O to complete, it has already risen the resync barrier and all
further normal I/O is therefore waiting for the resync op to lower its barrier.
(see call trace below)
The bug has been tested and verified on totally different x86-64 platforms (AMD
Opterion 1214HE + MCP55 chipset, Intel Core2Duo Notebook ICH9M Chipset), so it
is unlikely to be a hardware issue.
It has been verified using OpenSUSE 11.2 (2.6.31.12-0.2-xen) and 11.3 (2.6.34)
dom0 kernels, running on xen hypervisor 3.4.1, 3.4.2 and 3.4.3.
I could not reproduce the bug with kernels without any xen dom0 patches.
The situation seems to occur preferably when crashing the hardware node and the
VMs therefore start a file system journal replay (ext3). That is also the
potential reason why I could not reproduce the bug with regular kernels (ie.
without xen patches) -- I have no definitive clue whether this is a
xen-specific problem.
[ 603.229215] INFO: task md1_resync:1441 blocked for more than 120 seconds.
[ 603.229294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 603.229413] md1_resync D 0000000000000000 0 1441 2 0x00000000
[ 603.229505] ffff88003d967bb0 0000000000000246 ffff88003d967b10
ffff88003d967b30
[ 603.229627] 0000000000000000 ffff88003d967b78 000000000000a380
ffff88003dba8be8
[ 603.229753] 000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[ 603.229880] Call Trace:
[ 603.229964] [<ffffffffa002c63e>] raise_barrier+0xde/0x2e0 [raid1]
[ 603.230037] [<ffffffffa002d5cb>] sync_request+0x12b/0x680 [raid1]
[ 603.230112] [<ffffffff80399de9>] md_do_sync+0x669/0xc40
[ 603.230180] [<ffffffff8039ac54>] md_thread+0x54/0x150
[ 603.230249] [<ffffffff8006fac6>] kthread+0xb6/0xc0
[ 603.230318] [<ffffffff8000d38a>] child_rip+0xa/0x20
[ 603.230401] INFO: task python:5365 blocked for more than 120 seconds.
[ 603.230467] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 603.230584] python D 00000000c7c9e55f 0 5365 5348 0x00000000
[ 603.230657] ffff8800382595b8 0000000000000282 ffff880038259518
ffff880038259538
[ 603.230783] ffff8800382594e8 ffff880038259580 000000000000a380
ffff8800381e88e8
[ 603.230909] 000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[ 603.231035] Call Trace:
[ 603.231098] [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1]
[ 603.231169] [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1]
[ 603.231239] [<ffffffff80399498>] md_make_request+0xc8/0x140
[ 603.231309] [<ffffffff802224db>] generic_make_request+0x19b/0x4c0
[ 603.231380] [<ffffffff8022287d>] submit_bio+0x7d/0x110
[ 603.231448] [<ffffffff80153535>] mpage_bio_submit+0x35/0x50
[ 603.231517] [<ffffffff80153aa3>] do_mpage_readpage+0x383/0x710
[ 603.231595] [<ffffffff80153fb3>] mpage_readpages+0xf3/0x150
[ 603.231664] [<ffffffff801b8ccb>] ext2_readpages+0x2b/0x50
[ 603.231733] [<ffffffff800e0353>] read_pages+0x43/0x110
[ 603.231801] [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0
[ 603.231871] [<ffffffff800e05ff>] ra_submit+0x2f/0x50
[ 603.231936] [<ffffffff800e087d>] ondemand_readahead+0x11d/0x260
[ 603.232005] [<ffffffff800e0a60>] page_cache_async_readahead+0xa0/0xc0
[ 603.232082] [<ffffffff800d7311>] T.731+0x1f1/0x440
[ 603.232149] [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0
[ 603.232218] [<ffffffff80118da2>] do_sync_read+0x102/0x160
[ 603.232286] [<ffffffff801192d5>] vfs_read+0xd5/0x1c0
[ 603.232352] [<ffffffff801199fb>] sys_read+0x5b/0xa0
[ 603.233212] [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b
[ 603.233279] [<00007fab60821a90>] 0x7fab60821a90
[ 603.233340] INFO: task blkid:5393 blocked for more than 120 seconds.
[ 603.233402] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 603.233508] blkid D 0000000000000001 0 5393 1 0x00000000
[ 603.233573] ffff8800406d96e8 0000000000000282 ffff8800406d9648
ffff8800406d9668
[ 603.233685] ffff8800406d9618 ffff8800406d96b0 000000000000a380
ffff8800382fe4a8
[ 603.233798] 000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[ 603.233914] Call Trace:
[ 603.233969] [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1]
[ 603.234037] [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1]
[ 603.234106] [<ffffffff80399498>] md_make_request+0xc8/0x140
[ 603.234176] [<ffffffff802224db>] generic_make_request+0x19b/0x4c0
[ 603.234244] [<ffffffff8022287d>] submit_bio+0x7d/0x110
[ 603.234250] [<ffffffff80147c12>] submit_bh+0x102/0x150
[ 603.234257] [<ffffffff8014adac>] block_read_full_page+0x23c/0x3b0
[ 603.234262] [<ffffffff801509b6>] blkdev_readpage+0x26/0x50
[ 603.234268] [<ffffffff800e03f6>] read_pages+0xe6/0x110
[ 603.234273] [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0
[ 603.234278] [<ffffffff800e0839>] ondemand_readahead+0xd9/0x260
[ 603.234284] [<ffffffff800e0aad>] page_cache_sync_readahead+0x2d/0x50
[ 603.234288] [<ffffffff800d73d6>] T.731+0x2b6/0x440
[ 603.234293] [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0
[ 603.234300] [<ffffffff80118da2>] do_sync_read+0x102/0x160
[ 603.234305] [<ffffffff801192d5>] vfs_read+0xd5/0x1c0
[ 603.234310] [<ffffffff801199fb>] sys_read+0x5b/0xa0
[ 603.234315] [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b
[ 603.234320] [<00007fbbf8f1ea90>] 0x7fbbf8f1ea90
[ 603.234323] INFO: task blkid:5397 blocked for more than 120 seconds.
[ 603.234324] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 603.234326] blkid D 0000000098d5b5f5 0 5397 1 0x00000000
[ 603.234330] ffff8800381656c8 0000000000000286 ffff880038165628
ffff880038165648
[ 603.234333] 0000000000000000 ffff880038165690 000000000000a380
ffff8800382f8be8
[ 603.234336] 000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[ 603.234339] Call Trace:
[ 603.234345] [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1]
[ 603.234351] [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1]
[ 603.234358] [<ffffffff80399498>] md_make_request+0xc8/0x140
[ 603.234363] [<ffffffff802224db>] generic_make_request+0x19b/0x4c0
[ 603.234369] [<ffffffff8022287d>] submit_bio+0x7d/0x110
[ 603.234373] [<ffffffff80147c12>] submit_bh+0x102/0x150
[ 603.234379] [<ffffffff8014adac>] block_read_full_page+0x23c/0x3b0
[ 603.234383] [<ffffffff801509b6>] blkdev_readpage+0x26/0x50
[ 603.234388] [<ffffffff800e03f6>] read_pages+0xe6/0x110
[ 603.234393] [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0
[ 603.234398] [<ffffffff800e05ff>] ra_submit+0x2f/0x50
[ 603.234403] [<ffffffff800e087d>] ondemand_readahead+0x11d/0x260
[ 603.234408] [<ffffffff800e0aad>] page_cache_sync_readahead+0x2d/0x50
[ 603.234412] [<ffffffff800d73d6>] T.731+0x2b6/0x440
[ 603.234417] [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0
[ 603.234422] [<ffffffff80118da2>] do_sync_read+0x102/0x160
[ 603.234427] [<ffffffff801192d5>] vfs_read+0xd5/0x1c0
[ 603.234432] [<ffffffff801199fb>] sys_read+0x5b/0xa0
[ 603.234437] [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b
[ 603.234442] [<00007fb60ac89a90>] 0x7fb60ac89a90
[ 723.225805] INFO: task md1_resync:1441 blocked for more than 120 seconds.
[ 723.225892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 723.226010] md1_resync D 0000000000000000 0 1441 2 0x00000000
[ 723.226108] ffff88003d967bb0 0000000000000246 ffff88003d967b10
ffff88003d967b30
[ 723.226239] 0000000000000000 ffff88003d967b78 000000000000a380
ffff88003dba8be8
[ 723.226372] 000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[ 723.226504] Call Trace:
[ 723.226600] [<ffffffffa002c63e>] raise_barrier+0xde/0x2e0 [raid1]
[ 723.226684] [<ffffffffa002d5cb>] sync_request+0x12b/0x680 [raid1]
[ 723.226766] [<ffffffff80399de9>] md_do_sync+0x669/0xc40
[ 723.226841] [<ffffffff8039ac54>] md_thread+0x54/0x150
[ 723.226913] [<ffffffff8006fac6>] kthread+0xb6/0xc0
[ 723.226987] [<ffffffff8000d38a>] child_rip+0xa/0x20
[ 723.227078] INFO: task python:5365 blocked for more than 120 seconds.
[ 723.227147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 723.227265] python D 00000000c7c9e55f 0 5365 5348 0x00000000
[ 723.227346] ffff8800382595b8 0000000000000282 ffff880038259518
ffff880038259538
[ 723.227482] ffff8800382594e8 ffff880038259580 000000000000a380
ffff8800381e88e8
[ 723.227616] 000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[ 723.227750] Call Trace:
[ 723.227820] [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1]
[ 723.227904] [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1]
[ 723.227983] [<ffffffff80399498>] md_make_request+0xc8/0x140
[ 723.228059] [<ffffffff802224db>] generic_make_request+0x19b/0x4c0
[ 723.228135] [<ffffffff8022287d>] submit_bio+0x7d/0x110
[ 723.228209] [<ffffffff80153535>] mpage_bio_submit+0x35/0x50
[ 723.228284] [<ffffffff80153aa3>] do_mpage_readpage+0x383/0x710
[ 723.228362] [<ffffffff80153fb3>] mpage_readpages+0xf3/0x150
[ 723.228437] [<ffffffff801b8ccb>] ext2_readpages+0x2b/0x50
[ 723.228512] [<ffffffff800e0353>] read_pages+0x43/0x110
[ 723.228586] [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0
[ 723.228667] [<ffffffff800e05ff>] ra_submit+0x2f/0x50
[ 723.228746] [<ffffffff800e087d>] ondemand_readahead+0x11d/0x260
[ 723.228828] [<ffffffff800e0a60>] page_cache_async_readahead+0xa0/0xc0
[ 723.228906] [<ffffffff800d7311>] T.731+0x1f1/0x440
[ 723.228978] [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0
[ 723.229059] [<ffffffff80118da2>] do_sync_read+0x102/0x160
[ 723.229133] [<ffffffff801192d5>] vfs_read+0xd5/0x1c0
[ 723.229209] [<ffffffff801199fb>] sys_read+0x5b/0xa0
[ 723.229285] [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b
[ 723.229362] [<00007fab60821a90>] 0x7fab60821a90
[ 723.229431] INFO: task blkid:5393 blocked for more than 120 seconds.
[ 723.229500] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 723.229619] blkid D 0000000000000001 0 5393 1 0x00000000
[ 723.229700] ffff8800406d96e8 0000000000000282 ffff8800406d9648
ffff8800406d9668
[ 723.229857] ffff8800406d9618 ffff8800406d96b0 000000000000a380
ffff8800382fe4a8
[ 723.229998] 000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[ 723.230083] Call Trace:
[ 723.230097] [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1]
[ 723.230109] [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1]
[ 723.230126] [<ffffffff80399498>] md_make_request+0xc8/0x140
[ 723.230142] [<ffffffff802224db>] generic_make_request+0x19b/0x4c0
[ 723.230152] [<ffffffff8022287d>] submit_bio+0x7d/0x110
[ 723.230162] [<ffffffff80147c12>] submit_bh+0x102/0x150
[ 723.230173] [<ffffffff8014adac>] block_read_full_page+0x23c/0x3b0
[ 723.230184] [<ffffffff801509b6>] blkdev_readpage+0x26/0x50
[ 723.230193] [<ffffffff800e03f6>] read_pages+0xe6/0x110
[ 723.230203] [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0
[ 723.230213] [<ffffffff800e0839>] ondemand_readahead+0xd9/0x260
[ 723.230223] [<ffffffff800e0aad>] page_cache_sync_readahead+0x2d/0x50
[ 723.230235] [<ffffffff800d73d6>] T.731+0x2b6/0x440
[ 723.230244] [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0
[ 723.230254] [<ffffffff80118da2>] do_sync_read+0x102/0x160
[ 723.230263] [<ffffffff801192d5>] vfs_read+0xd5/0x1c0
[ 723.230278] [<ffffffff801199fb>] sys_read+0x5b/0xa0
[ 723.230288] [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b
[ 723.230298] [<00007fbbf8f1ea90>] 0x7fbbf8f1ea90
[ 723.230303] INFO: task blkid:5397 blocked for more than 120 seconds.
[ 723.230306] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 723.230310] blkid D 0000000098d5b5f5 0 5397 1 0x00000000
[ 723.230316] ffff8800381656c8 0000000000000286 ffff880038165628
ffff880038165648
[ 723.230322] 0000000000000000 ffff880038165690 000000000000a380
ffff8800382f8be8
[ 723.230327] 000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[ 723.230333] Call Trace:
[ 723.230344] [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1]
[ 723.230355] [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1]
[ 723.230381] [<ffffffff80399498>] md_make_request+0xc8/0x140
[ 723.230391] [<ffffffff802224db>] generic_make_request+0x19b/0x4c0
[ 723.230401] [<ffffffff8022287d>] submit_bio+0x7d/0x110
[ 723.230410] [<ffffffff80147c12>] submit_bh+0x102/0x150
[ 723.230423] [<ffffffff8014adac>] block_read_full_page+0x23c/0x3b0
[ 723.230434] [<ffffffff801509b6>] blkdev_readpage+0x26/0x50
[ 723.230443] [<ffffffff800e03f6>] read_pages+0xe6/0x110
[ 723.230453] [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0
[ 723.230463] [<ffffffff800e05ff>] ra_submit+0x2f/0x50
[ 723.230475] [<ffffffff800e087d>] ondemand_readahead+0x11d/0x260
[ 723.230487] [<ffffffff800e0aad>] page_cache_sync_readahead+0x2d/0x50
[ 723.230495] [<ffffffff800d73d6>] T.731+0x2b6/0x440
[ 723.230504] [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0
[ 723.230514] [<ffffffff80118da2>] do_sync_read+0x102/0x160
[ 723.230523] [<ffffffff801192d5>] vfs_read+0xd5/0x1c0
[ 723.230535] [<ffffffff801199fb>] sys_read+0x5b/0xa0
[ 723.230547] [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b
[ 723.230556] [<00007fb60ac89a90>] 0x7fb60ac89a90
[ 723.230563] INFO: task lvscan:5451 blocked for more than 120 seconds.
[ 723.230566] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 723.230570] lvscan D 000000003a1ec879 0 5451 5449 0x00000000
[ 723.230576] ffff880038207898 0000000000000282 ffff8800382077f8
ffff880038207818
[ 723.230581] ffff8800382077e8 ffff880038207860 000000000000a380
ffff8800407047e8
[ 723.230587] 000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[ 723.230592] Call Trace:
[ 723.230606] [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1]
[ 723.230618] [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1]
[ 723.230631] [<ffffffff80399498>] md_make_request+0xc8/0x140
[ 723.230641] [<ffffffff802224db>] generic_make_request+0x19b/0x4c0
[ 723.230650] [<ffffffff8022287d>] submit_bio+0x7d/0x110
[ 723.230659] [<ffffffff8015210b>] dio_bio_submit+0x6b/0xc0
[ 723.230668] [<ffffffff80152d28>] direct_io_worker+0x258/0x3c0
[ 723.230678] [<ffffffff801530be>] __blockdev_direct_IO+0x22e/0x4d0
[ 723.230687] [<ffffffff80150838>] blkdev_direct_IO+0x58/0x80
[ 723.230695] [<ffffffff800d7737>] generic_file_aio_read+0x1d7/0x1f0
[ 723.230708] [<ffffffff80118da2>] do_sync_read+0x102/0x160
[ 723.230718] [<ffffffff801192d5>] vfs_read+0xd5/0x1c0
[ 723.230727] [<ffffffff801199fb>] sys_read+0x5b/0xa0
[ 723.230736] [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b
[ 723.230745] [<00007f78d4b85a80>] 0x7f78d4b85a80
[ 843.222203] INFO: task md1_resync:1441 blocked for more than 120 seconds.
[ 843.222290] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 843.222406] md1_resync D 0000000000000000 0 1441 2 0x00000000
[ 843.222508] ffff88003d967bb0 0000000000000246 ffff88003d967b10
ffff88003d967b30
[ 843.222641] 0000000000000000 ffff88003d967b78 000000000000a380
ffff88003dba8be8
[ 843.222777] 000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[ 843.222912] Call Trace:
[ 843.223009] [<ffffffffa002c63e>] raise_barrier+0xde/0x2e0 [raid1]
[ 843.223091] [<ffffffffa002d5cb>] sync_request+0x12b/0x680 [raid1]
[ 843.223175] [<ffffffff80399de9>] md_do_sync+0x669/0xc40
[ 843.223250] [<ffffffff8039ac54>] md_thread+0x54/0x150
[ 843.223325] [<ffffffff8006fac6>] kthread+0xb6/0xc0
[ 843.223400] [<ffffffff8000d38a>] child_rip+0xa/0x20
--
Configure bugmail:
http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
_______________________________________________
Xen-bugs mailing list
Xen-bugs@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-bugs
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Xen-bugs] [Bug 1659] New: Dom0 'looses' BIOs,
bugzilla-daemon <=
|
|
|
|
|