[Xen-devel] I/O bound dom0 locks-ups


I'm using Xen 2.0-testing.  I use an LVM volume for swap partitions and
loop files for the OS filesystems.

I am using the program stress
(http://weather.ou.edu/~apw/projects/stress) to simulate different kinds
of workloads.  As far as the CPU (sqrt()), IO (sync()), and VM (malloc
()) tests everything is fine and the domains seem fair and domain0 is

The problem begins when I use the HDD (write()) test inside a domU.  As
soon as it starts dom0 immediately freezes and it stays that way until
the stress program exits.  The domU running stress along with other
domU's remain responsive.

I can get a snapshot of what is going on in dom0 by looking at stats
immediately after the stress program exits.  What I note is that xenblkd
and the loop device for the domain are using some CPU, about 5% each.
The rest of the CPU is consumed by "wa" (time spent waiting for I/O).  

I have tried this with dom0 and the domU on different and same
processor.  I have also used information from this list to tune the BVT
scheduler to make dom0 warp and have a high priority (low MCU) compared
to the domU's.  I read that BVT has problems with I/O so I tried using
the round robin scheduler as a test and noticed the same behavior.

Now onto how to solve this...

1) Is it most likely that this is caused by how the schedulers work or
is it due to using loop files?
2) Could opening the loop files with the O_DIRECT flag cause any
performance benefits?  dom0 on my system only has 128MB (it runs very
3) If the problem is with the schedulers is there any possibility for a
fix or to at least tune it to make it so dom0 doesn't lock-up?  It makes
it very hard to login and diagnose which domain might be causing the
high I/O when you can't even connect to dom0.


