WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] Avoid race when moving cpu between cpupools

To: Andre Przywara <andre.przywara@xxxxxxx>
Subject: Re: [Xen-devel] [PATCH] Avoid race when moving cpu between cpupools
From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
Date: Mon, 28 Feb 2011 10:29:28 +0100
Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, "Diestelhorst, Stephan" <Stephan.Diestelhorst@xxxxxxx>
Delivery-date: Mon, 28 Feb 2011 01:30:10 -0800
Dkim-signature: v=1; a=rsa-sha256; c=simple/simple; d=ts.fujitsu.com; i=juergen.gross@xxxxxxxxxxxxxx; q=dns/txt; s=s1536b; t=1298885371; x=1330421371; h=message-id:date:from:mime-version:to:cc:subject: references:in-reply-to:content-transfer-encoding; bh=eDeXPnqtHR89Wx/wXpt13Vz+4y9AvnudbKRtU9b5FAs=; b=iK9RleqDxStLhRlbzFCUcX/gxFl4/y/nR23IuQiNQvFhI6t0qrdz7cWD IQkwzad0mETuRzXyLI/7Mxl8uHjqHlOBHS1czVMjJ6fJI6ddfqGwGqyWG PhznuO/sbQe9zTeZS86OnTd4PDm3Z8DRVGzY/5/g2tisSR8zV704RhLRb LcZqFUCFp8SUy/uf6s5ZAM1rqYvy/NNtzkYnbyd4OSLgTd7ZhNsHrg2xo 7nyaijpQd01xJJvnDJYNshHUsfoUN;
Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=AZJUvj15Drhnz54w+YgWIphSu3lQe/gbt13XtDNq13Xakpe2D8SDc3Jk y9pGc/KGBRqacfviCt1Nh0JtPDk+/bAsgTPOujmnmy+h1BLTK2twklx8l sU1TcYpHZzUPD62Vgx8MP8R9LJOuQWY/U8X8CEzez8eK0ALFnqvwAt5+o keFidx2a4ZwHZHNp0k2tIGkPW2Pxq5wkYzzTATJ0+gux1HpdKOGmn/bWE LGxmzj4d9RfxMPAGbjoAND3ArJlA3;
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4D67BBDA.5070603@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Fujitsu Technology Solutions
References: <5485071c8b0a6a49f65b.1298541625@nehalem1> <4D666678.1000301@xxxxxxx> <AANLkTikSiJKLH=ginoEgO4Tx0-Z1AC2bwP4qBDjVSfAg@xxxxxxxxxxxxxx> <4D67BBDA.5070603@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20101226 Iceowl/1.0b1 Icedove/3.0.11
On 02/25/11 15:25, Andre Przywara wrote:
George Dunlap wrote:
Looks good -- thanks Juergen.

Acked-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>

-George

On Thu, Feb 24, 2011 at 2:08 PM, Andre Przywara
<andre.przywara@xxxxxxx> wrote:
Juergen Gross wrote:
Moving cpus between cpupools is done under the schedule lock of the
moved
cpu.
When checking a cpu being member of a cpupool this must be done with
the
lock
of that cpu being held.
I have reviewed and tested the patch. It fixes my problem. My script has
been running for several hundred iterations without any Xen crash,
whereas
without the patch the hypervisor crashed mostly at the second iteration.

Juergen,

can you rule out that this code will be triggered on two CPUs trying to
switch to each other? As Stephan pointed out: the code looks like as
this could trigger a possible dead-lock condition, where:
1) CPU A grabs lock (a) while CPU B grabs lock (b)
2) CPU A tries to grab (b) and CPU B tries to grab (a)
3) both fail and loop to 1)

Good point. Not quite a dead-lock, but a possible live-lock :-)

A possible fix would be to introduce some ordering for the locks (just
the pointer address) and let the "bigger" pointer yield to the "smaller"
one.

Done this and sent a patch.

I am not sure if this is really necessary, but I now see strange
hangs after running the script for a while (30min to 1hr).
Sometimes Dom0 hangs for a while, loosing interrupts (sda or eth0) or
getting spurious ones, on two occasions the machine totally locked up.

I am not 100% sure whether this is CPUpools related, but I put some load
on Dom0 (without messing with CPUpools) for the whole night and it ran
fine.

Did you try to do this with all Dom0-vcpus pinned to 6 physical cpus?
I had the same problems when using only few physical cpus for many vcpus.
And I'm pretty sure this was NOT the possible live-lock, as it happened
already without this change when I tried to reproduce your problem.


Sorry for this :-(
I will try to further isolate this.

Anyway, it works much better with the fix than without and I will try to
trigger this with the "reduce number of Dom0 vCPUs" patch.


Thanks, Juergen

--
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel