You nailed it, Keir.
On Thu, Aug 03, 2006 at 09:03:18AM +0100, Keir Fraser wrote:
> Also older versions using sedf scheduler (which has now been patched to
> avoid this) could end up with domain0 consuming all CPU and starving
> other guests, leading to softlockup errors. We haven't seen any such
> errors on our own test machines since this was fixed. Of course, that
> doesn't mean there aren't problems with other test scenarios!
That is exactly what was happening. I did more testing yesterday and
last night (-testing changeset 9732), and realized that I was only
seeing soft lockups on the second of two domU guests, and only when
running a heavy load in dom0. According to 'xm vcpu-list' the second
guest was on CPU 0, as was the workload in dom0... I added more
workload processes to consume both CPUs in dom0, and of course when I
did that, the first guest ground to a halt and started showing soft
lockups as well.
I was usually able to trigger the soft lockups in a few seconds simply
by running one or more of these in dom0:
cat /dev/zero > /dev/null
Variants of 'nc -ub 255.255.255.255 10000 < /dev/zero' and
'nc -u -l -p 10000 > /dev/null' in dom0 or domU also made things
interesting, though I'm not sure that the network traffic is a factor.
(Kids, don't do this on a production net...)
So I built -unstable changeset 10868, and ran an even heavier workload
(the above, plus 'bonnie' in the guests) on dom0 and two guests
overnight, and they experienced no soft lockups; running -unstable,
changeset 10868, credit scheduler. This same workload would have
caused soft lockups within seconds in -testing changeset 9732 using
the sedf scheduler; I may not have been able to get it started at all.
Response time remained subsecond under -unstable; -testing would have
been on its knees.
Stephen G. Traugott (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
http://www.stevegt.com -- http://Infrastructures.Org
Xen-devel mailing list