I’ve
got a bit of a problem here with my network. I’m using CentOS 5 as a dom0
with many different domUs. I can reproduce this bug at will, so I guess it’s
something serious. I’ve also read some reply on this list which told it
is a known bug, but I can’t find any trace of it.
I
share a file via any protocol on any domU. The file has to be of a big enough
size so that the upload takes a while. 1 gig usually does the trick. Then, I
logon to another machine (physical machine, that is…), and start retreiving
it from the domU instance. After some random number of seconds, the network
stalls, the link goes down, and the packets start beeing dropped.
Luckily,
I can logon locally to the dom0 and confirm that the link is down and that
packets are dropped. Briging the peth0 interface down and up again doesn’t
fix anything, nor does a physical disconnect/reconnect. I have to reboot the
whole machine physically to make it work again. It’s not a DoS problem,
because I can confirm that I don’t end up with thousands of TIME_WAIT connections.
Now,
how do I diagnose / fix the problem ? I’m out of ideas and have been
looking around for 3 days now… sadly.
Xen
: 3.0.3-25
Kernel
: 2.6.18-8.1.8
Network
: bridged
name
= "XXXXX"
memory
= "1500"
disk
= [ 'phy:/dev/XenGuests0/XXXXX,xvda,w', ]
vif
= [ 'mac=00:04:75:4f:XX:XX, bridge=xenbr0', ]
uuid
= "c3463377-ba64-caf7-87b1-bc34294274b7"
bootloader="/usr/bin/pygrub"
vcpus=1
cpus="2"
on_reboot
= 'restart'