[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] disk io errors possibly caused by high network load?



> 
> We rebooted the machines really quickly because it was a productive
> system, so I didn't have the time to copy the logs, and on the disks I
> see nothing about this in the logfiles, propably because the IO was
> already down.
> 
> The machines are Supermicro, Intel Xeon Quad or Dual-Quadcore, 8 to 32
> GB RAM, and some have a mdraid setup with two SATA drives with the on
> board sata controller (intel ICH), other have a dedicated 3ware / AMCC
> 9660 or similar.
> 
> The machines that crashed were on different power lines and connected to
> different switches, although on the same network segment. Also there
> were no physical interferences.
> 
> The error was reported by domU and dom0 - both saying the disk would
> give a I/O error, but no specific information.
> 
> Network card is intel e1000.

The error wasn't a timeout was it? We had a similar problem under Windows (no 
Xen involved at all) where the switch the server was plugged into was looped 
back to itself one evening. Any broadcast packet sent to the switch would just 
circulate around the switch indefinitely, until there were enough broadcast 
packets looping around that everything ground to a halt.

The server was a HP DL380, so a more than capable machine, but there were 
enough interrupts occurring due to a completely saturated network that 
everything was reporting timeouts. In this case the server didn't require a 
reboot. It sat in that state the whole night, reporting disk timeouts etc but 
the moment we rectified the cabling fault in the morning it instantly bounced 
back to life.

It could be that Linux treats timeout errors a little more severely?

Can anyone say if the layer above blkfront in the Linux kernel will report 
timeouts? Or would the errors have been coming through from Dom0?

Anyway, do you have a test environment you can reproduce the problem on? If the 
problem is as simple as a looped switch then it shouldn't be too hard to 
reproduce...

James


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.