[Xen-users] Additional details (was Re: XEN - Broadcom issue: survey)
I'm happy to see that something is moving on this issue...
The following to check the status on the problem.
DomU packets get dropped when transferring large quantities of data on hi
speed (gigabit for me, don't know if it's the same on 100mb) links. This
causes various problems: interrupted ftp sessions, ssh sessions, samab
shares get corrupted, etc...
* server: HP ProLiant DL380 G5 (6gb ram - 2 dual core cpus)
* Xen: Xen 3.1 compiled from stable source on x86 PAE enabled
* Dom0: CentOS 5
* DomU: Win2k3
* ethernet card: Broadcom NetXTreme II 5708
* Xen networking configuration: tried all the standard (basically, bridge
and route); also tried to assign an IP to the bridge instead of using the
Actions taken (with no success...)
* recompile the kernel with the latest (1.5.10c) drivers from broadcom
* disable the managed mode via uxdiag (as described somewhere on this list)
* played a bit with the "txqueuelen" parameter on VIFxxx, xenbr and tap0
interfaces (usually, you should have a value of around 1000 for this on
"normal" ethernet interfaces, while VIF are showing "strange" low values)
* played a bit with ethtool to disable SG and TCO
Weird and odds
I also have another server (a Dell machine) equipped with the same network
interface: on this machine I run XenExpress (the one running Xen 3.0.4). I
tried to run the exact same DomU I use for testing on the HP (HVM, no PV
drivers), and...it works fine!
Differences I could find:
* some sysctl differences in the ipv4/conf/all --> tried to migrate them tp
the HP with no luck
* different kernel version (Dell is running 188.8.131.52-xs184.108.40.2061.3960xen,
while HP is running 2.6.18 manually compiled during Xen rebuild)
* different Broadcom drivers: Dell is running 1.5.1c, HP 1.5.10c (latest)
* different Broadcom firmware: Dell is running 2.9.1, HP 1.9.3
Now, forgetting about hard to determine differences (changes in the kernel
added by the XenExpress package? other "misterious" configuration
differences?), the only serious thing I'd like to try is to update the HP
Does anybody know how to do this?
The only thing I was able to find out during my testing is that raising the
value for txqueuelen on the VIFxxx interface and/or disabling the SG and TCO
flag on the card somehow "lowers" the problem (I can see less packets
dropped); it's hard to quantify this, but maybe its' important...
Any other ideas? I've other HP servers I'd like to use in our lab here to
virtualize workstations and everything s blocked by this bug...you can
imagine how badly I need to fix this ;-)
PS: Some references for similar issues I've found around:
View this message in context:
Sent from the Xen - User mailing list archive at Nabble.com.
Xen-users mailing list