|
|
|
|
|
|
|
|
|
|
xen-users
[Xen-users] strange scheduling / io problem
Hello there.
I'm trying to find out the cause of a strange problem I'm seeing with
xen but I'm not doing much progress. A little background:
I have two Dell SC1425 (dual P4/Xeon (nocona) HTT non-VTx) servers
running Xen and Gentoo Linux. The current setup is (everything
installed from Gentoo portage):
- Xen 3.0.4-1
- Gentoo linux domains
- everything pure x86_64, no 32bit code
- Kernel 2.6.20-xen-r2, different kernels for Dom0 and DomU
- drbd 8.0.5
I have drbd doing "network raid" with a crossover gigabit between the
two servers. Then I use LVM2 on the drbd disks to make logical volumes
for the DomUs. Dom0 are on good old raid1 software with no drbd or
lvm. This setup has proven stable and is working for a long time, it
actually started in early 2006 with earlier versions of xen and linux.
I am now in the process of upgrading one of the two nodes, and I
rebuilt the system from scratch with Xen-3.2.1, Linux-2.6.21-xen,
drbd-8.0.12.
Everything is working well except for a thing: I have an rsyncd server
in the Dom0 serving a local copy of the portage tree. If I run "emerge
--sync" from a DomU using my local rsyncd server the DomU nearly
freezes. Looking with xm top / xm list it seems that it just doesn't
get cpu time scheduled or it is locked waiting for something. The "cpu
time" consumed by the DomU does not increase for a while. After some
time the rsync client in the DomU goes timeout and then everything
works normal again. Curious facts:
1) the whole system is otherwise idle and doing nothing
2) just the Dom0 and one DomU for the tests
3) if I put some disk load on both dom0 and domU (I use tar cf
tmpfile.tar /usr) everything is ok
4) if I put some cpu load on both dom0 and domU (I use
distributed.net's client to bring cpu use to 100% on both physical
cpus) everything is ok
5) if I put some simple network load ("nc -l -p 1999 > /dev/null" in
Dom0 and "dd if=/dev/zero bs=1024k count=10240 | nc 172.16.0.2 1999 -
q1" in DomU) everything is ok
6) if I do the dnetc + tar + netcat things all together both in dom0
and domU, everything is ok and both domains are still responsive
7) if I run "emerge --sync" in domU against an rsyncd on another
machine (a gentoo official mirror or even my other node connected via
crossover gigabit) everything is ok
.....but if I run "emerge --sync" in domU against the rsyncd server on
dom0 on the same hardware, the dom0 runs ok and is responsive while
the domU becomes sluggish: hitting enter at the empty login prompt,
without a username, on "xm console" takes 40 seconds before getting
another login request. The rsync clients will transfer some (little)
data before these "freezing" occurs.
I really can't figure out what the rsync(domU) + rsyncd(dom0) does
that makes it behave like this and can't reproduce the thing with any
other test.
I tried using my "old" kernel for the domU (the linux-2.6.20-xen-r2
from the other running node) and it behaves exactly the same.
I use the default scheduler (credit) with default settings, dom0 has
access to all cpus (4 logical cpus: they're two hyperthreading xeons),
domU is single-processor kernel with just one vcpu.
Any help would be appreciated. Thanks.
--
Luca Lesinigo
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|
<Prev in Thread] |
Current Thread |
[Next in Thread> |
- [Xen-users] strange scheduling / io problem,
Luca Lesinigo <=
|
|
|
|
|