[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] domU's crashing Dom0 (Xen + iSCS = timebomb)


  • To: "Luis Vinay" <luisvinay@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx>
  • Date: Thu, 7 Dec 2006 23:07:57 -0000
  • Delivery-date: Thu, 07 Dec 2006 15:08:31 -0800
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AccaEWdD/01kT4NURhuctoDDGttusQAAQHfQ
  • Thread-topic: [Xen-devel] domU's crashing Dom0 (Xen + iSCS = timebomb)

> I'm experimenting with xen + iscsi, and I founded that under heavy
stress
> domU's can crash entire system, I've reproduced this many many times.

Are all the crashes the same? Have you got a collection of the oops
messages?

> Xen 3.0.2-2 Dom0 kernel 2.6.16.29-xen0 (stressed ext2 filesystem )
over
> iscsi
>    Result: 15min and crashed

The fact that this crashed without any domU's or blkback/front suggests
that this may be a native problem rather than just specifically Xen
(though it could be an event channel interaction). 

Ho are you stressing the filesystem? It might be worth finding out if
the bug is more easily triggered by either read or write workloads.

It would also be very interesting to know whether it can be repro'ed on
32b. I'm not sure whether its possible to dynamically turn of NAPI
support with ethtool, but this would be interesting too.

Thanks,
Ian

> My system is like this
> 
> Software:
> - iSCSI Enterprise Target v0.4.13
> - RedHat AS4 update 4 64bit + Xen 3.0.3-0 Kernel 2.6.16.29 + Open
iSCSI
> v2.0.730 (Initiator)
> - Bonnie++ v1.03a
> 
> VM:
> - Debian 3.1r3 + Open iSCSI v2.0.730 (Initiator)
> - RedHat AS4 update 4 + Open iSCSI v2.0.730 (Initiator)
> 
> Tests:
> Four instances of bonnie++ with root uid on the filesystem to be
stressed
> 
> hda (S.O., and swap) + hdb (stressed ext3 filesystem ) over iscsi,
both
> debian and RH
>    Result: crash
> hda (S.O., and swap) local + hdb (stressed ext3 filesystem ) over
iscsi,
> both debian and RH
>    Result: crash
> 
> hda (S.O., and swap) + hdb (stressed ext3-writeback mode filesystem )
over
> iscsi, both debian and RH
>    Result: crash
> hda (S.O., and swap) local + hdb (stressed ext3-writeback mode
filesystem )
> over iscsi, both debian and RH
>    Result: crash
> 
> hda (S.O., and swap) + hdb (stressed ext2 filesystem ) over iscsi,
both
> debian and RH
>    Result: crash
> hda (S.O., and swap) local + hdb (stressed ext2 filesystem ) over
iscsi,
> both debian and RH
>    Result: crash
> 
> hda (S.O., and swap) + hdb (stressed xfs filesystem ) over iscsi, both
> debian and RH
>    Result: crash
> hda (S.O., and swap) local + hdb (stressed xfs filesystem ) over
iscsi,
> both debian and RH
>    Result: crash
> 
> Also tested:
> 
> Xen 3.0.2-2 Dom0 kernel 2.6.16-xen0 (stressed ext2 filesystem ) over
iscsi
>    Result: ~60hs of testing with no problems (then stopped the tests)
> 
> kernel 2.6.16.29 (stressed ext2 filesystem ) over iscsi
>    Result: ~24.30hs of testing with no problems (then stopped the
tests)
> 
> Xen 3.0.2-2 Dom0 kernel 2.6.16.29-xen0 (stressed ext2 filesystem )
over
> iscsi
>    Result: 15min and crashed
> 
> I managed to capture the error:
> Unable to handle kernel NULL pointer dereference at 00000000000000e8
RIP:
> <ffffffff88009a1e>{:bnx2:bnx2_poll+231}
> PGD 1f4d7067 PUD 1f613067 PMD 0
> Oops: 0000 [1] SMP
> CPU 0
> Modules linked in: xt_physdev iptable_filter ip_tables x_tables bridge
> 8021q netloop ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core
sunrpc
> crc32c libcrc32c iscsi_tcp libiscsi scsi_transport_iscsi joydev tsdev
> binfmt_misc dm_mirror dm_mod usb_storage video thermal processor
usbmouse
> usbhid usbkbd fan container button battery ac uhci_hcd ehci_hcd
usbcore
> hw_random e1000 bnx2 piix ide_generic
> Pid: 0, comm: swapper Not tainted 2.6.16.29-xen0 #3
> RIP: e030:[<ffffffff88009a1e>] <ffffffff88009a1e>{:bnx2:bnx2_poll+231}
> RSP: e02b:ffffffff80503de8  EFLAGS: 00010286
> RAX: 000000000000c9f8 RBX: ffff880017778e30 RCX: ffff880014eee000
> RDX: 0000000000000001 RSI: 000000000000c9f7 RDI: 00000000000000e3
> RBP: 0000000000000000 R08: 0000000100215d2c R09: 000000000000002c
> R10: 0000000000000200 R11: 0000000000000246 R12: 000000000000c9e3
> R13: 0000000100215d29 R14: ffff88001e15ed00 R15: 0000000000000000
> FS:  00002b02608bf360(0000) GS:ffffffff804b3000(0000)
> knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000
> Process swapper (pid: 0, threadinfo ffffffff804ca000, task
> ffffffff80428bc0)
> Stack: 0000000000000001 0000000000000001 0000000000000bf0
ffff88001877abf0
>        00000000000000d0 ffffffff80111c93 0000000000000bf0
ffffffff8800d0c3
>        ffff880000000002 ffffffff00000000
> Call Trace: <IRQ> <ffffffff80111c93>{dma_map_page+43}
>        <ffffffff8800d0c3>{:bnx2:bnx2_start_xmit+801}
> <ffffffff803548be>{net_rx_action+230}
>        <ffffffff801325d6>{__do_softirq+114}
> <ffffffff8010bac6>{call_softirq+30}
>        <ffffffff8010d575>{do_softirq+71} <ffffffff8010d3ed>{do_IRQ+63}
>        <ffffffff802f6b82>{evtchn_do_upcall+192}
> <ffffffff8010b5f6>{do_hypervisor_callback+30} <EOI>
>        <ffffffff801073aa>{hypercall_page+938}
> <ffffffff801073aa>{hypercall_page+938}
>        <ffffffff8010f702>{safe_halt+132}
<ffffffff80108d77>{xen_idle+106}
>        <ffffffff80108e36>{cpu_idle+171}
> <ffffffff804cd77a>{start_kernel+488}
>        <ffffffff804cd223>{_sinittext+547}
> 
> Code: 48 8b 85 e8 00 00 00 66 83 78 06 00 74 25 0f b7 40 04 41 8d
> RIP <ffffffff88009a1e>{:bnx2:bnx2_poll+231} RSP <ffffffff80503de8>
> CR2: 00000000000000e8
>  <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
>  (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
> 
> Luis Vinay


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.