This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-users] lots of cycles in i/o wait state

To: Miles Fidelman <mfidelman@xxxxxxxxxxxxxxxx>
Subject: Re: [Xen-users] lots of cycles in i/o wait state
From: Pasi Kärkkäinen <pasik@xxxxxx>
Date: Mon, 7 Jun 2010 11:08:30 +0300
Cc: "xen-users@xxxxxxxxxxxxxxxxxxx" <xen-users@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Mon, 07 Jun 2010 01:10:52 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4C0AD6E7.1000809@xxxxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <4C0AD6E7.1000809@xxxxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.18 (2008-05-17)
On Sat, Jun 05, 2010 at 06:59:51PM -0400, Miles Fidelman wrote:
> Hi Folks,
> I've been doing some experimenting to see how far I can push some old  
> hardware into a virtualized environment - partially to see how much use  
> I can get out of the hardware, and partially to learn more about the  
> behavior of, and interactions between, software RAID, LVM, DRBD, and Xen.
> Basic configuration:
> - two machines, 4 disk drives each, two 1G ethernet ports (1 each to the  
> outside world, 1 each as a cross-connect)
> - each machine runs Xen 3 on top of Debian Lenny (the basic install)
> - very basic Dom0s - just running the hypervisor and i/o (including disk  
> management)
> ---- software RAID6 (md)

Software RAID6 will really suck for random IO performance..
IO pattern from running multiple VMs will be random!

> ---- LVM
> ---- DRBD
> ---- heartbeat to provide some failure migration
> - dom0, on each machine, runs directly on md RAID volumes (RAID1 for  
> boot, RAID6 for root and swap)
> - each Xen VM uses 2 DRBD volumes - one for root, one for swap
> - one of the VMs has a third volume, used for backup copies of files
> One domU, on one machine, runs a medium volume mail/list server.  This  
> used to run non-virtualized on one of the machines, and I moved it into  
> a domU.  Before virtualization, everything just hummed along (98% idle  
> time as reported by top).  Virtualized, the machine is mostly idle, but  
> now top reports a lot of i/o wait time, usually in the 20-25% range).

Is your disk/partition aligment properly set up? Doing it wrong could
cause bad performance. It's easy to mess it up with VMs.

> As I've started experimenting with adding additional domUs, in various  
> configurations, I've found that my mail server can get into a state  
> where it's spending almost all of its cycles in an i/o wait state (95%  
> and higher as reported by top).  This is particularly noticeable when I  
> run a backup job (essentially a large tar job that reads from the root  
> volume and writes to the backup volume).  The domU grinds to halt.

Is that iowait measure in the guest, or in dom0?

> So I've been trying to track down the bottlenecks.
> At first, I thought this was probably a function of pushing my disk  
> stack beyond reasonable limits - what with multiple domUs on top of DRBD  
> volumes, on top of LVM volumes, on top of software RAID6 (md).  I  
> figured I was seeing a lot of disk churning.

Yeah, that setup will slow you down a lot. 

RAID6 is bad for random IO performance, and DRBD doesn't really help there..

> But... after running some disk benchmarks, what I'm seeing is something  
> else:
> - I took one machine, turned off all the domUs, and turned off DRBD
> - I ran a disk benchmark (bonnie++) on dom0, which reported 50MB/sec to  
> 90MB/sec of throughput depending on the test (not exactly sure what this  
> means, but it's a baseline)
> - I then brought up DRBD and various combinations of domUs, and ran the  
> benchmark in various places
> - the most interesting result, running in the same domU as the mail  
> server: 34M-60M depending on the test (not much degredation from running  
> directly on the RAID volume
> - but.... while running, the benchmark, the baseline i/o wait percentage  
> jumps from 25% to the 70-90% range

Again run "iostat 1" in both the domU and dom0, and compare the results.
Also run "xm top" in dom0 to monitor the overall CPU usage.

> So... the question becomes, if it's not disk churning, what's causing  
> all those i/o wait cycles?  I'm starting to think it might involve  
> buffering or other interactions in the hypervisor.
> Any thoughts or suggestions regarding diagnostics and/or tuning?  (Other  
> than "throw hardware at it" of course :-).

Remember your storage cannot do many random IOs..

-- Pasi

Xen-users mailing list