[Xen-users] strange scheduling / io problem

Hello there.

I'm trying to find out the cause of a strange problem I'm seeing withxen but I'm not doing much progress. A little background:

I have two Dell SC1425 (dual P4/Xeon (nocona) HTT non-VTx) serversrunning Xen and Gentoo Linux. The current setup is (everythinginstalled from Gentoo portage):

- Xen 3.0.4-1
- Gentoo linux domains
- everything pure x86_64, no 32bit code
- Kernel 2.6.20-xen-r2, different kernels for Dom0 and DomU
- drbd 8.0.5

I have drbd doing "network raid" with a crossover gigabit between thetwo servers. Then I use LVM2 on the drbd disks to make logical volumesfor the DomUs. Dom0 are on good old raid1 software with no drbd orlvm. This setup has proven stable and is working for a long time, itactually started in early 2006 with earlier versions of xen and linux.

I am now in the process of upgrading one of the two nodes, and Irebuilt the system from scratch with Xen-3.2.1, Linux-2.6.21-xen,drbd-8.0.12.

Everything is working well except for a thing: I have an rsyncd serverin the Dom0 serving a local copy of the portage tree. If I run "emerge--sync" from a DomU using my local rsyncd server the DomU nearlyfreezes. Looking with xm top / xm list it seems that it just doesn'tget cpu time scheduled or it is locked waiting for something. The "cputime" consumed by the DomU does not increase for a while. After sometime the rsync client in the DomU goes timeout and then everythingworks normal again. Curious facts:


1) the whole system is otherwise idle and doing nothing
2) just the Dom0 and one DomU for the tests

3) if I put some disk load on both dom0 and domU (I use tar cftmpfile.tar /usr) everything is ok4) if I put some cpu load on both dom0 and domU (I usedistributed.net's client to bring cpu use to 100% on both physicalcpus) everything is ok5) if I put some simple network load ("nc -l -p 1999 > /dev/null" inDom0 and "dd if=/dev/zero bs=1024k count=10240 | nc 172.16.0.2 1999 -q1" in DomU) everything is ok6) if I do the dnetc + tar + netcat things all together both in dom0and domU, everything is ok and both domains are still responsive7) if I run "emerge --sync" in domU against an rsyncd on anothermachine (a gentoo official mirror or even my other node connected viacrossover gigabit) everything is ok

.....but if I run "emerge --sync" in domU against the rsyncd server ondom0 on the same hardware, the dom0 runs ok and is responsive whilethe domU becomes sluggish: hitting enter at the empty login prompt,without a username, on "xm console" takes 40 seconds before gettinganother login request. The rsync clients will transfer some (little)data before these "freezing" occurs.

I really can't figure out what the rsync(domU) + rsyncd(dom0) doesthat makes it behave like this and can't reproduce the thing with anyother test.I tried using my "old" kernel for the domU (the linux-2.6.20-xen-r2from the other running node) and it behaves exactly the same.I use the default scheduler (credit) with default settings, dom0 hasaccess to all cpus (4 logical cpus: they're two hyperthreading xeons),domU is single-processor kernel with just one vcpu.


Any help would be appreciated. Thanks.

--
Luca Lesinigo


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

WARNING - OLD ARCHIVES

xen-users

[Xen-users] strange scheduling / io problem