WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] disk io errors possibly caused by high network load?

To: Moritz Möller <m.moeller@xxxxxxxxxxxx>, "Ian Pratt" <Ian.Pratt@xxxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] disk io errors possibly caused by high network load?
From: "James Harper" <james.harper@xxxxxxxxxxxxxxxx>
Date: Fri, 19 Sep 2008 23:21:32 +1000
Cc:
Delivery-date: Fri, 19 Sep 2008 06:22:05 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <122AA196D7CE4E4DBB92911EF4AB5AB38460E5@xxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <122AA196D7CE4E4DBB92911EF4AB5AB3846045@xxxxxxxxxxxxxxxxxxxxxxxxxx><DD74FBB8EE28D441903D56487861CD9D362C58F1@xxxxxxxxxxxxxxxxxxxxxx> <122AA196D7CE4E4DBB92911EF4AB5AB38460E5@xxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AckaTGDzATGLlbxxRXaBZvbEdrbgFgACImAQAAB97RAAALp6wA==
Thread-topic: [Xen-devel] disk io errors possibly caused by high network load?
> 
> We rebooted the machines really quickly because it was a productive
> system, so I didn't have the time to copy the logs, and on the disks I
> see nothing about this in the logfiles, propably because the IO was
> already down.
> 
> The machines are Supermicro, Intel Xeon Quad or Dual-Quadcore, 8 to 32
> GB RAM, and some have a mdraid setup with two SATA drives with the on
> board sata controller (intel ICH), other have a dedicated 3ware / AMCC
> 9660 or similar.
> 
> The machines that crashed were on different power lines and connected to
> different switches, although on the same network segment. Also there
> were no physical interferences.
> 
> The error was reported by domU and dom0 - both saying the disk would
> give a I/O error, but no specific information.
> 
> Network card is intel e1000.

The error wasn't a timeout was it? We had a similar problem under Windows (no 
Xen involved at all) where the switch the server was plugged into was looped 
back to itself one evening. Any broadcast packet sent to the switch would just 
circulate around the switch indefinitely, until there were enough broadcast 
packets looping around that everything ground to a halt.

The server was a HP DL380, so a more than capable machine, but there were 
enough interrupts occurring due to a completely saturated network that 
everything was reporting timeouts. In this case the server didn't require a 
reboot. It sat in that state the whole night, reporting disk timeouts etc but 
the moment we rectified the cabling fault in the morning it instantly bounced 
back to life.

It could be that Linux treats timeout errors a little more severely?

Can anyone say if the layer above blkfront in the Linux kernel will report 
timeouts? Or would the errors have been coming through from Dom0?

Anyway, do you have a test environment you can reproduce the problem on? If the 
problem is as simple as a looped switch then it shouldn't be too hard to 
reproduce...

James


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel