Re: [Xen-users] lots of cycles in i/o wait state

On Sat, Jun 05, 2010 at 06:59:51PM -0400, Miles Fidelman wrote:
> Hi Folks,
>
> I've been doing some experimenting to see how far I can push some old  
> hardware into a virtualized environment - partially to see how much use  
> I can get out of the hardware, and partially to learn more about the  
> behavior of, and interactions between, software RAID, LVM, DRBD, and Xen.
>
> Basic configuration:
>
> - two machines, 4 disk drives each, two 1G ethernet ports (1 each to the  
> outside world, 1 each as a cross-connect)
> - each machine runs Xen 3 on top of Debian Lenny (the basic install)
> - very basic Dom0s - just running the hypervisor and i/o (including disk  
> management)
> ---- software RAID6 (md)
>

Software RAID6 will really suck for random IO performance..
IO pattern from running multiple VMs will be random!

> ---- LVM
> ---- DRBD
> ---- heartbeat to provide some failure migration
> - dom0, on each machine, runs directly on md RAID volumes (RAID1 for  
> boot, RAID6 for root and swap)
> - each Xen VM uses 2 DRBD volumes - one for root, one for swap
> - one of the VMs has a third volume, used for backup copies of files
>
> One domU, on one machine, runs a medium volume mail/list server.  This  
> used to run non-virtualized on one of the machines, and I moved it into  
> a domU.  Before virtualization, everything just hummed along (98% idle  
> time as reported by top).  Virtualized, the machine is mostly idle, but  
> now top reports a lot of i/o wait time, usually in the 20-25% range).
>

Is your disk/partition aligment properly set up? Doing it wrong could
cause bad performance. It's easy to mess it up with VMs.

> As I've started experimenting with adding additional domUs, in various  
> configurations, I've found that my mail server can get into a state  
> where it's spending almost all of its cycles in an i/o wait state (95%  
> and higher as reported by top).  This is particularly noticeable when I  
> run a backup job (essentially a large tar job that reads from the root  
> volume and writes to the backup volume).  The domU grinds to halt.
>

Is that iowait measure in the guest, or in dom0?

> So I've been trying to track down the bottlenecks.
>
> At first, I thought this was probably a function of pushing my disk  
> stack beyond reasonable limits - what with multiple domUs on top of DRBD  
> volumes, on top of LVM volumes, on top of software RAID6 (md).  I  
> figured I was seeing a lot of disk churning.
>

Yeah, that setup will slow you down a lot. 

RAID6 is bad for random IO performance, and DRBD doesn't really help there..

> But... after running some disk benchmarks, what I'm seeing is something  
> else:
>
> - I took one machine, turned off all the domUs, and turned off DRBD
> - I ran a disk benchmark (bonnie++) on dom0, which reported 50MB/sec to  
> 90MB/sec of throughput depending on the test (not exactly sure what this  
> means, but it's a baseline)
>
> - I then brought up DRBD and various combinations of domUs, and ran the  
> benchmark in various places
> - the most interesting result, running in the same domU as the mail  
> server: 34M-60M depending on the test (not much degredation from running  
> directly on the RAID volume
> - but.... while running, the benchmark, the baseline i/o wait percentage  
> jumps from 25% to the 70-90% range
>

Again run "iostat 1" in both the domU and dom0, and compare the results.
Also run "xm top" in dom0 to monitor the overall CPU usage.

> So... the question becomes, if it's not disk churning, what's causing  
> all those i/o wait cycles?  I'm starting to think it might involve  
> buffering or other interactions in the hypervisor.
>
> Any thoughts or suggestions regarding diagnostics and/or tuning?  (Other  
> than "throw hardware at it" of course :-).
>

Remember your storage cannot do many random IOs..

-- Pasi


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
WARNING - OLD ARCHIVES

xen-users

Re: [Xen-users] lots of cycles in i/o wait state