This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] blocking Xen 3.X production use: soft lockup bugs

To: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Subject: Re: [Xen-devel] blocking Xen 3.X production use: soft lockup bugs
From: Steve Traugott <stevegt@xxxxxxxxxxxxx>
Date: Fri, 4 Aug 2006 13:21:21 -0700
Cc: Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Fri, 04 Aug 2006 13:22:03 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <1efc279e1b6db92d9564c61b25c06df8@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <A95E2296287EAD4EB592B5DEEFCE0E9D572305@xxxxxxxxxxxxxxxxxxxxxxxxxxx> <1efc279e1b6db92d9564c61b25c06df8@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.9i
You nailed it, Keir.   

On Thu, Aug 03, 2006 at 09:03:18AM +0100, Keir Fraser wrote:
> Also older versions using sedf scheduler (which has now been patched to 
> avoid this) could end up with domain0 consuming all CPU and starving 
> other guests, leading to softlockup errors. We haven't seen any such 
> errors on our own test machines since this was fixed. Of course, that 
> doesn't mean there aren't problems with other test scenarios!

That is exactly what was happening.  I did more testing yesterday and
last night (-testing changeset 9732), and realized that I was only
seeing soft lockups on the second of two domU guests, and only when
running a heavy load in dom0.  According to 'xm vcpu-list' the second
guest was on CPU 0, as was the workload in dom0...  I added more
workload processes to consume both CPUs in dom0, and of course when I
did that, the first guest ground to a halt and started showing soft
lockups as well.

I was usually able to trigger the soft lockups in a few seconds simply
by running one or more of these in dom0:

    cat /dev/zero > /dev/null
Variants of 'nc -ub 10000 < /dev/zero' and 
'nc -u -l -p 10000 > /dev/null' in dom0 or domU also made things 
interesting, though I'm not sure that the network traffic is a factor.  
(Kids, don't do this on a production net...)  

So I built -unstable changeset 10868, and ran an even heavier workload
(the above, plus 'bonnie' in the guests) on dom0 and two guests
overnight, and they experienced no soft lockups; running -unstable,
changeset 10868, credit scheduler.  This same workload would have
caused soft lockups within seconds in -testing changeset 9732 using
the sedf scheduler; I may not have been able to get it started at all.
Response time remained subsecond under -unstable; -testing would have
been on its knees.

Stephen G. Traugott  (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
http://www.stevegt.com -- http://Infrastructures.Org

Xen-devel mailing list