WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] Stability

I'm wondering if anyone might have any suggestions... 

We've installed CentOS5.2 on 3 servers and have an FC switch, dual-port cards, 
and FC RAID device. We configured the RAID as two separate logical RAIDs. One 
is RAID-10 for speed and it hosts Xen VM images as LVs. The rest of the drive 
is RAID-5 to maximize space and houses a lot of lighter access data on a GFS2 
filesystem. The switch is "VLAN'ed" such that one FC port sees the one logical 
RAID and only the other port sees the other. 

Everything is stock CentOS except the Intel IGB NIC drivers. The stock ones had 
issues, there appear to be bug reports on this and 5.3 *probably* resolves it. 
In the meantime though I got the latest GPL driver tarball from intel.com and 
installed that and those issues we were seeing went away. 

Anyway, we're seeing terrible stability issues and I'm asking for pointers 
because I've yet to get a good handle on what the cause could be and/or where 
to concentrate efforts. This isn't specifically Xen related, but it might be 
amusing to you, or it might be recognized by someone who knows the solution. We 
just did a large-packet broadcast ping flood and it causes a USB-attached drive 
to get disconnected followed by FC card driver errors, followed by full system 
crashes. This time we stopped the ping after the FC errors and the cluster 
recovered. 

# ping 192.168.1.10 -b -s1472 -f 

Feb 27 13:56:56 servername kernel: usb 1-7: USB disconnect, address 2 
Feb 27 13:59:02 servername kernel: qla2xxx 0000:06:00.0: scsi(2:0:0): Abort 
command issued -- 1 1ae2f 2002. 
Feb 27 13:59:02 servername kernel: qla2xxx 0000:06:00.0: scsi(2:0:0): Abort 
command issued -- 1 1ae30 2002. 
Feb 27 13:59:02 servername kernel: qla2xxx 0000:06:00.0: scsi(2:0:0): Abort 
command issued -- 1 1ae31 2002. 
Feb 27 13:59:02 servername kernel: qla2xxx 0000:06:00.0: scsi(2:0:0): Abort 
command issued -- 1 1ae32 2002. 
.. ping flood stopped here .. 
Feb 27 14:00:03 servername kernel: usb 1-7: new high speed USB device using 
ehci_hcd and address 3 
Feb 27 14:00:03 servername kernel: usb 1-7: configuration #1 chosen from 1 
choice 
Feb 27 14:00:03 servername kernel: input: Peppercon AG Multidevice as 
/class/input/input3 
Feb 27 14:00:03 servername kernel: input: USB HID v1.01 Mouse [Peppercon AG 
Multidevice] on usb-0000:00:1d.7-7 
Feb 27 14:00:03 servername kernel: input: Peppercon AG Multidevice as 
/class/input/input4 
Feb 27 14:00:03 servername kernel: input: USB HID v1.01 Keyboard [Peppercon AG 
Multidevice] on usb-0000:00:1d.7-7 

Suggestions on directions to go? CentOS 5.3 might resolve some or all of this 
but it's still quite a few days out looks like. I've seen newer Xen packages 
built for RHEL/CentOS and discussed on here, is what comes stock stable enough 
or are newer versions primarily to add more features? 

Grasping at straws....
Thanks.

PS. 
08:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit Network 
Connection (rev 02)

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>