On Sat, Jun 05, 2010 at 06:59:51PM -0400, Miles Fidelman wrote:
> Hi Folks,
> I've been doing some experimenting to see how far I can push some old
> hardware into a virtualized environment - partially to see how much use
> I can get out of the hardware, and partially to learn more about the
> behavior of, and interactions between, software RAID, LVM, DRBD, and Xen.
> Basic configuration:
> - two machines, 4 disk drives each, two 1G ethernet ports (1 each to the
> outside world, 1 each as a cross-connect)
> - each machine runs Xen 3 on top of Debian Lenny (the basic install)
> - very basic Dom0s - just running the hypervisor and i/o (including disk
> ---- software RAID6 (md)
Software RAID6 will really suck for random IO performance..
IO pattern from running multiple VMs will be random!
> ---- LVM
> ---- DRBD
> ---- heartbeat to provide some failure migration
> - dom0, on each machine, runs directly on md RAID volumes (RAID1 for
> boot, RAID6 for root and swap)
> - each Xen VM uses 2 DRBD volumes - one for root, one for swap
> - one of the VMs has a third volume, used for backup copies of files
> One domU, on one machine, runs a medium volume mail/list server. This
> used to run non-virtualized on one of the machines, and I moved it into
> a domU. Before virtualization, everything just hummed along (98% idle
> time as reported by top). Virtualized, the machine is mostly idle, but
> now top reports a lot of i/o wait time, usually in the 20-25% range).
Is your disk/partition aligment properly set up? Doing it wrong could
cause bad performance. It's easy to mess it up with VMs.
> As I've started experimenting with adding additional domUs, in various
> configurations, I've found that my mail server can get into a state
> where it's spending almost all of its cycles in an i/o wait state (95%
> and higher as reported by top). This is particularly noticeable when I
> run a backup job (essentially a large tar job that reads from the root
> volume and writes to the backup volume). The domU grinds to halt.
Is that iowait measure in the guest, or in dom0?
> So I've been trying to track down the bottlenecks.
> At first, I thought this was probably a function of pushing my disk
> stack beyond reasonable limits - what with multiple domUs on top of DRBD
> volumes, on top of LVM volumes, on top of software RAID6 (md). I
> figured I was seeing a lot of disk churning.
Yeah, that setup will slow you down a lot.
RAID6 is bad for random IO performance, and DRBD doesn't really help there..
> But... after running some disk benchmarks, what I'm seeing is something
> - I took one machine, turned off all the domUs, and turned off DRBD
> - I ran a disk benchmark (bonnie++) on dom0, which reported 50MB/sec to
> 90MB/sec of throughput depending on the test (not exactly sure what this
> means, but it's a baseline)
> - I then brought up DRBD and various combinations of domUs, and ran the
> benchmark in various places
> - the most interesting result, running in the same domU as the mail
> server: 34M-60M depending on the test (not much degredation from running
> directly on the RAID volume
> - but.... while running, the benchmark, the baseline i/o wait percentage
> jumps from 25% to the 70-90% range
Again run "iostat 1" in both the domU and dom0, and compare the results.
Also run "xm top" in dom0 to monitor the overall CPU usage.
> So... the question becomes, if it's not disk churning, what's causing
> all those i/o wait cycles? I'm starting to think it might involve
> buffering or other interactions in the hypervisor.
> Any thoughts or suggestions regarding diagnostics and/or tuning? (Other
> than "throw hardware at it" of course :-).
Remember your storage cannot do many random IOs..
Xen-users mailing list