Re: [Xen-users] disk I/O problems under load? (Xen-3.4.1/x86_64)

On Thu, Oct 01, 2009 at 01:33:57AM +0200, Luca Lesinigo wrote:
> I'm getting problems whenever the load on a system increase, but IMHO  
> it should be well withing hardware capabilities.
> 
> My configuration:
> - HP Proliant DL160G5, with a single quadcore E5405, 14GiB RAM, 2x1TB  
> sata disks Hitachi 7K1000.B on the onboard sata controller (intel  
> chipset)
> - Xen-3.4.1 64bit hypervisor, compiled from gentoo portage, with  
> default commandline settings (I just specify the serial console and  
> nothing else)
> - Domain-0 with gentoo's xen-sources 2.6.21 (the xen 2.6.18 tarball  
> didn't have networking, I think the HP Tigon3 gigabit driver is too  
> old but hadn't time to look into that
> - Domain-0 is using the CFQ i/o scheduler, and works from a software  
> raid-1, no tickless kernel, HZ=100. It has all the free ram (currently  
> some 5.x GiB)
> - the rest of the disks is also mirrored in a raid-1 device, and I use  
> LVM2 on top of that
> - 6x paravirt 64bit DomU with 2.6.29-gentoo-r5 kernel, with NOOP i/o  
> scheduler, tickless kernel. 1 - 1.5GiB of ram each.
> - 1x HVM 32bit Windows XP DomU, without any paravirt driver, 512MiB RAM
> - I use logical volumes as storage space for DomU's, the linux ones  
> also have 0.5GiB of swap space (unused, no DomU is swapping)
> - all the linux DomU are on ext3 (noatime), and all DomU are single- 
> cpu (just one vcpu each)
> - network is bridged (one lan and one wan interface on the physical  
> system and the same for each domU), no jumbo frames
> 
> Usually load on the system is very low. But when there is some I/O  
> related load (I can easily trigger it by rsync'ing lots of stuff  
> between domU's or from a different system to one of the domU or to the  
> dom0) load gets very high and I often see domU's spending all their  
> cpu time in "wait" [for I/O] state. When that happens, load on  
> Domain-0 gets high (jumps from <1 to >5) and loads on DomU's get high  
> too probably because of processes waiting for I/O to happen. Sometimes  
> iostat will even show exactly 0.00 tps on all the dm-X devices (domU  
> storage backends) and some activity on the physical devices, like all  
> domU I/O activity froze up while dom0 is busy flushing caches or doing  
> some other stuff.
> 
> vmstat in Dom0 shows one or two cores (25% or 50% cpu time) busy in  
> 'iowait' state, and context switches go in the thousands, but not in  
> the hundreths thousands that http://wiki.xensource.com/xenwiki/KnownIssues 
>  talks about.
> 

You have only 2x 7200 rpm disks for 7 virtual machines and you're
wondering why there's a lot of iowait? :)

> I tried pinning cpus: Domain-0 had its four VCPUs pinned to CPUs 0 and  
> 1, some domU's pinned to CPU 2, and some domU's pinned to CPU 3. As  
> far as I can tell it did not do any difference.
> I also (briefly) tested with all linux DomU's running with the CFQ  
> scheduler, while it didn't seem to make any difference it also was too  
> short of a test to trust it much.
> 
> What's worse, sometimes I get qemu-dm processes (for the HVM domU) in  
> zombie state. It also happened that the HVM domU crashed and I wasn't  
> able to restart it: I got the hotplug scripts not working error from  
> xm create, and looking in xenstore-ls I saw instances of the crashed  
> domU with all its resources (which probably was the cause of the  
> error?). Had the reboot the whole system to be able to start that  
> domain again.
> 
> Normally iostat in Domain-0 shows more or less high tps (200~300 under  
> normal load, even higher if I play around with rsync to artificially  
> trigger the problems) on the md device where all the DomU reside, and  
> much less (usually just 10-20% of the previous value) on the two  
> physical disks sda and sdb that compose the mirror. I guess I see less  
> tps because the scheduler/elevator in Dom-0 is doing its job.
> 
> I don't know if the load problems and the HVM problem are linked or  
> not, but I also don't know where to look to solve any one of them.
> 
> Any help would be appreciated, thank you very much. Also, what are  
> ideal/recommended settings in dom0 and domU regarding i/o schedulers  
> and tickless or not?
> Is there any reason to leave the hypervisor some extra free ram or it  
> is ok to just let xend shrink dom0 when needed and leave free just the  
> minimum? If I sum up memory (currently) used by domains, I get  
> 14146MiB. xm info says 14335MiB total_memory and 10MiB free_memory.
> 

Single 7200 rpm SATA disk can do around 120 random IOPS.. 
120 IO operations per second.

120 IOPS / 7 VMs = 17 IOPS available per VM.

That's not much..

-- Pasi


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
WARNING - OLD ARCHIVES

xen-users

Re: [Xen-users] disk I/O problems under load? (Xen-3.4.1/x86_64)