[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Strange Block Device Issue



Has anyone seen anything remotely like this before?  First, I'm running the initial Xen-3.0.4 release, although I experienced it with 3.0.3 as well.

The problem occurs when I create a new DomU with the following setup:

kernel = '/boot/vmlinuz-2.6.16.33-xenU-domU'
memory = 384
name = 'some_name'
vif = [ 'bridge=xenbr0' ]
disk = [
        'phy:/dev/working/lvm/device,sda1,w',
        'phy:/dev/working/lvm/device2,sda2,w',
       'phy:/dev/etherd/e2.5,sdc,w',
       'phy:/dev/etherd/e2.6,sdd,w',
       ]
root = '/dev/sda1 ro'
vcpus = 1

/dev/etherd are Coraid ATA Over Ethernet devices (model SR1521) and without these devices it boots fine.  With them, badness.

As soon as I boot it, it appears to boot fine, but two xvd processes immediately appear and spike to 100% processor usage as shown in this "top" screenshot:

top - 02:37:14 up 9 days,  1:17,  1 user,  load average: 10.86, 8.05, 4.04
Tasks: 174 total,  10 running, 164 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0% us, 49.4% sy,  0.0% ni, 50.5% id,  0.0% wa,  0.0% hi,  0.2% si
Mem:    524288k total,   447200k used,    77088k free,    56028k buffers
Swap:        0k total,        0k used,        0k free,   140492k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
31832 root      11  -5     0    0    0 R   97  0.0   7:04.73 xvd 40 98:260     
31833 root      11  -5     0    0    0 R   95  0.0   7:04.74 xvd 40 98:250     
5406 root      16   0 21616  920  680 S    1  0.2  15:08.71 collectd          
7295 root      10  -5     0    0    0 S    0  0.0   4:57.45 xvd 2 fe:16b      
    1 root      16   0  2612  564  480 S    0  0.1   0:04.63 init              
    2 root      RT   0     0    0    0 S    0  0.0   0:00.84 migration/0       
    3 root      34  19     0    0    0 S    0  0.0   0:27.89 ksoftirqd/0       
    4 root      RT   0     0    0    0 S    0  0.0   0:00.04 watchdog/0        
    5 root      RT   0     0    0    0 R    0  0.0   0:00.59 migration/1       
    6 root      34  19     0    0    0 S    0  0.0   0:00.01 ksoftirqd/1       
    7 root      RT   0     0    0    0 R    0  0.0   0:00.00 watchdog/1        
    8 root      RT   0     0    0    0 R    0  0.0   0:01.14 migration/2       
    9 root      34  19     0    0    0 S    0  0.0   0:00.02 ksoftirqd/2       
   10 root      RT   0     0    0    0 R    0  0.0   0:00.01 watchdog/2        
   11 root      RT   0     0    0    0 S    0  0.0   0:00.67 migration/3       
   12 root      34  19     0    0    0 S    0  0.0   0:00.00 ksoftirqd/3       
   13 root      RT   0     0    0    0 S    0  0.0   0:00.00 watchdog/3         

If I look at the dmesg log, it appears that for each block device I create, I get the following kernel BUGs:

[781896.696277] BUG: soft lockup detected on CPU#1!
[781896.696286] CPU 1:
[781896.696289] Modules linked in: raid1 md_mod dlm cman netconsole dummy dm_mod aoe e1000 tg3
[781896.696299] Pid: 31833, comm: xvd 40 98:250 Tainted: GF     2.6.16.33-xen0 #1
[781896.696303] RIP: e030:[<ffffffff8013d3a2>] <ffffffff8013d3a2>{lock_timer_base+34}
[781896.696313] RSP: e02b:ffff8800093d9d88  EFLAGS: 00000246
[781896.696316] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
[781896.696321] RDX: ffffffffff578000 RSI: ffff8800093d9dc0 RDI: ffff880015ae6190
[781896.696325] RBP: ffff8800093d9da8 R08: 0000000000240004 R09: 0000000000000000
[781896.696329] R10: 0000000000000001 R11: ffff88001f442cc0 R12: ffff880015ae6190
[781896.696334] R13: ffff8800093d9dc0 R14: ffff880014a09dd0 R15: ffff8800093d9e68
[781896.696341] FS:  00002ab3cc642080(0000) GS:ffffffff804f3080(0000) knlGS:0000000000000000
[781896.696344] CS:  e033 DS: 0000 ES: 0000
[781896.696346]
[781896.696347] Call Trace: <ffffffff8013d3fd>{try_to_del_timer_sync+29}
[781896.696355]        <ffffffff8013d468>{del_timer_sync+24} <ffffffff8025e980>{blk_sync_queue+16}
[781896.696365]        <ffffffff8025e9c2>{blk_cleanup_queue+50} <ffffffff802e96c2>{unplug_queue+50}
[781896.696375]        <ffffffff802ea0f8>{blkif_schedule+888} <ffffffff802e9d80>{blkif_schedule+0}
[781896.696384]        <ffffffff8014931a>{kthread+218} <ffffffff8012ddcd>{schedule_tail+77}
[781896.696394]        <ffffffff8010c16a>{child_rip+8} <ffffffff80149240>{kthread+0}
[781896.696403]        <ffffffff8010c162>{child_rip+0}
[781897.148121] xenbr0: topology change detected, sending tcn bpdu
[781897.148145] xenbr0: port 31(vif40.0) entering forwarding state
[781897.532118] BUG: soft lockup detected on CPU#2!
[781897.532132] CPU 2:
[781897.532135] Modules linked in: raid1 md_mod dlm cman netconsole dummy dm_mod aoe e1000 tg3
[781897.532148] Pid: 31832, comm: xvd 40 98:260 Tainted: GF     2.6.16.33-xen0 #1
[781897.532152] RIP: e030:[<ffffffff8013d3c4>] <ffffffff8013d3c4>{lock_timer_base+68}
[781897.532164] RSP: e02b:ffff88000f28dd88  EFLAGS: 00000246
[781897.532167] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
[781897.532172] RDX: ffffffffff578000 RSI: ffff88000f28ddc0 RDI: ffff880006f5b190
[781897.532176] RBP: ffff88000f28dda8 R08: 0000000002000000 R09: 0000000000000000
[781897.532180] R10: 0000000000000001 R11: ffff88001f442cc0 R12: ffff880006f5b190
[781897.532185] R13: ffff88000f28ddc0 R14: ffff88001c628e80 R15: ffff88000f28de68
[781897.532192] FS:  00002b04f44c7080(0000) GS:ffffffff804f3100(0000) knlGS:0000000000000000
[781897.532195] CS:  e033 DS: 0000 ES: 0000
[781897.532197]
[781897.532198] Call Trace: <ffffffff8013d3fd>{try_to_del_timer_sync+29}
[781897.532207]        <ffffffff8013d468>{del_timer_sync+24} <ffffffff8025e980>{blk_sync_queue+16}
[781897.532217]        <ffffffff8025e9c2>{blk_cleanup_queue+50} <ffffffff802e96c2>{unplug_queue+50}
[781897.532228]        <ffffffff802ea0f8>{blkif_schedule+888} <ffffffff802e9d80>{blkif_schedule+0}
[781897.532237]        <ffffffff8014931a>{kthread+218} <ffffffff8012ddcd>{schedule_tail+77}
[781897.532248]        <ffffffff8010c16a>{child_rip+8} <ffffffff80149240>{kthread+0}
[781897.532257]        <ffffffff8010c162>{child_rip+0}

I currently can work around this by making a simple dm table that maps them one-to-one to a new device.  Obviously tedious, but better than not working.

I've since this specific issue triggered it again.  Here's the slightly different trace it produced:

[49712.408105] BUG: soft lockup detected on CPU#2!
[49712.408115] CPU 2:
[49712.408118] Modules linked in: dm_mod dlm cman netconsole dummy raid1 md_mod aoe e1000 tg3
[49712.408131] Pid: 28043, comm: xvd 2 98:08 Tainted: GF     2.6.16.33-xen0 #1
[49712.408135] RIP: e030:[<ffffffff8013d3c4>] <ffffffff8013d3c4>{lock_timer_base+68}
[49712.408147] RSP: e02b:ffff88001332bd88  EFLAGS: 00000246
[49712.408150] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
[49712.408155] RDX: ffffffffff578000 RSI: ffff88001332bdc0 RDI: ffff88001ec86990
[49712.408159] RBP: ffff88001332bda8 R08: 0000000000240004 R09: 0000000000000000
[49712.408163] R10: 0000000000000001 R11: ffff88001ffe2cc0 R12: ffff88001ec86990
[49712.408168] R13: ffff88001332bdc0 R14: ffff8800161c3560 R15: ffff88001332be68
[49712.408175] FS:  00002b5d3edaa6d0(0000) GS:ffffffff804f3100(0000) knlGS:0000000000000000
[49712.408178] CS:  e033 DS: 0000 ES: 0000
[49712.408180]
[49712.408181] Call Trace: <ffffffff8013d3fd>{try_to_del_timer_sync+29}
[49712.408191]        <ffffffff8013d468>{del_timer_sync+24} <ffffffff8025e980>{blk_sync_queue+16}
[49712.408201]        <ffffffff8025e9c2>{blk_cleanup_queue+50} <ffffffff802e96c2>{unplug_queue+50}
[49712.408212]        <ffffffff802ea0f8>{blkif_schedule+888} <ffffffff80121c15>{sys32_adjtimex+741}
[49712.408221]        <ffffffff802e9d80>{blkif_schedule+0} <ffffffff8014931a>{kthread+218}
[49712.408230]        <ffffffff8012ddcd>{schedule_tail+77} <ffffffff8010c16a>{child_rip+8}
[49712.408241]        <ffffffff80149240>{kthread+0} <ffffffff8010c162>{child_rip+0}

Does anyone have any idea what is behind this?  It is 100% reproducible for me.

-- 
Jayson Vantuyl
Systems Architect


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.