[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] Re: Stability
I guess I clicked send on that a bit earlier than I intended to... The ping flood was just part of some testing we were doing trying to find a consistent means of crashing the boxes, and it does appear to be that. They crash plenty on their own though, especially with lots of GFS file access, but sometimes from lots of access via Xen VM machines, or other things we're not really got a handle on yet. Feb 13 13:44:51 serenity1 openais[4266]: [TOTEM] The token was lost in the OPERATIONAL state. Followed by the node being fenced. But again, not entirely Xen related, but maybe some or all of this is. Thanks... ----- "Wendell Dingus" <wendell@xxxxxxxxxxxxx> wrote: > I'm wondering if anyone might have any suggestions... > > We've installed CentOS5.2 on 3 servers and have an FC switch, dual-port cards, and FC RAID device. We configured the RAID as two separate logical RAIDs. One is RAID-10 for speed and it hosts Xen VM images as LVs. The rest of the drive is RAID-5 to maximize space and houses a lot of lighter access data on a GFS2 filesystem. The switch is "VLAN'ed" such that one FC port sees the one logical RAID and only the other port sees the other. > > Everything is stock CentOS except the Intel IGB NIC drivers. The stock ones had issues, there appear to be bug reports on this and 5.3 *probably* resolves it. In the meantime though I got the latest GPL driver tarball from intel.com and installed that and those issues we were seeing went away. > > Anyway, we're seeing terrible stability issues and I'm asking for pointers because I've yet to get a good handle on what the cause could be and/or where to concentrate efforts. This isn't specifically Xen related, but it might be amusing to you, or it might be recognized by someone who knows the solution. We just did a large-packet broadcast ping flood and it causes a USB-attached drive to get disconnected followed by FC card driver errors, followed by full system crashes. This time we stopped the ping after the FC errors and the cluster recovered. > > # ping 192.168.1.10 -b -s1472 -f > > Feb 27 13:56:56 servername kernel: usb 1-7: USB disconnect, address 2 > Feb 27 13:59:02 servername kernel: qla2xxx 0000:06:00.0: scsi(2:0:0): Abort command issued -- 1 1ae2f 2002. > Feb 27 13:59:02 servername kernel: qla2xxx 0000:06:00.0: scsi(2:0:0): Abort command issued -- 1 1ae30 2002. > Feb 27 13:59:02 servername kernel: qla2xxx 0000:06:00.0: scsi(2:0:0): Abort command issued -- 1 1ae31 2002. > Feb 27 13:59:02 servername kernel: qla2xxx 0000:06:00.0: scsi(2:0:0): Abort command issued -- 1 1ae32 2002. > .. ping flood stopped here .. > Feb 27 14:00:03 servername kernel: usb 1-7: new high speed USB device using ehci_hcd and address 3 > Feb 27 14:00:03 servername kernel: usb 1-7: configuration #1 chosen from 1 choice > Feb 27 14:00:03 servername kernel: input: Peppercon AG Multidevice as /class/input/input3 > Feb 27 14:00:03 servername kernel: input: USB HID v1.01 Mouse [Peppercon AG Multidevice] on usb-0000:00:1d.7-7 > Feb 27 14:00:03 servername kernel: input: Peppercon AG Multidevice as /class/input/input4 > Feb 27 14:00:03 servername kernel: input: USB HID v1.01 Keyboard [Peppercon AG Multidevice] on usb-0000:00:1d.7-7 > > Suggestions on directions to go? CentOS 5.3 might resolve some or all of this but it's still quite a few days out looks like. I've seen newer Xen packages built for RHEL/CentOS and discussed on here, is what comes stock stable enough or are newer versions primarily to add more features? > > Grasping at straws.... > Thanks. > > PS. > 08:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02) > _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |