WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
Date: Wed, 16 Feb 2011 15:28:39 +0100
Cc: Andre Przywara <andre.przywara@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Diestelhorst, Stephan" <Stephan.Diestelhorst@xxxxxxx>
Delivery-date: Wed, 16 Feb 2011 06:29:44 -0800
Dkim-signature: v=1; a=rsa-sha256; c=simple/simple; d=ts.fujitsu.com; i=juergen.gross@xxxxxxxxxxxxxx; q=dns/txt; s=s1536b; t=1297866522; x=1329402522; h=message-id:date:from:mime-version:to:cc:subject: references:in-reply-to:content-transfer-encoding; bh=cosJI/BKDvOdYvCPmDO0aIJyjSt99NLEJs3IlTHJ6Eo=; b=SEB4nHCiz36vMF83PYBBwxnJZX4qDENEyjGkG/mmlYVJptkZvAlJuD7p odMzHBJERrMCJokE0vbDFnaMDlei8PgBoY9ut29zMX5YDyofDIYuozGq/ hA8fNzwFipE7Tpl8q/IVdymNBq3bAzvglJReriEMGe/7o7TSAmpdcZTVh DVh9n5UviCScrpNPV2PcMsKi65MBmnM1JPAnssk2o2UjMMBnUFKLpCt/7 ZIH6G0KrVjMdsTtefaKVQtBJvY4k6;
Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=splSEDno0JqoSVphsXWradKiAcJroQl22PU44nzd/bUy9+QRbUDIGs48 tTkRQ+4atK8JffV+tG3ihN6pCpRCtGKCKqem2eJCprBzOjp3JYAeMQVQv cmUOV6Iuov+IAjrurZKgDwl+l315OCIqinFzgA1MumKbG4cfqLUu5/Zh0 UglRfIKNAQAiysx0JrwLN0ywa4Wmx1MpdaZzES4D3LlbOb/TNmmhloyvm bTGKyRbBU8tToAdNfa84j2mZ3cXzF;
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4D5BDAF8.50800@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Fujitsu Technology Solutions
References: <4D41FD3A.5090506@xxxxxxx> <4D4A72D8.3020502@xxxxxxxxxxxxxx> <4D4C08B6.30600@xxxxxxx> <4D4FE7E2.9070605@xxxxxxx> <4D4FF452.6060508@xxxxxxxxxxxxxx> <AANLkTinoRUQC_suVYFM9-x3D00KvYofq3R=XkCQUj6RP@xxxxxxxxxxxxxx> <4D50D80F.9000007@xxxxxxxxxxxxxx> <AANLkTinKJUAXhiXpKui_XX8XCD6T5fmzNARwHE6Fjafv@xxxxxxxxxxxxxx> <AANLkTinP0z9GynF1RFd8RwzWuqvxYdb+UBE+7xKpX6D4@xxxxxxxxxxxxxx> <4D517051.10402@xxxxxxx> <AANLkTi=MiELBnPFvb6-jzVth+T7aKxP5JMFhVh3Crdmo@xxxxxxxxxxxxxx> <AANLkTikgGNz=imS1xRVVjntY5P=+MuT_Qsb=-h3QHajY@xxxxxxxxxxxxxx> <4D529BD9.5050200@xxxxxxx> <4D52A2CD.9090507@xxxxxxxxxxxxxx> <4D5388DF.8040900@xxxxxxxxxxxxxx> <4D53AF27.7030909@xxxxxxx> <4D53F3BC.4070807@xxxxxxx> <4D54D478.9000402@xxxxxxxxxxxxxx> <4D54E79E.3000800@xxxxxxx> <AANLkTimkRAHtM4CoTskQ7w6B-8Pis4B2+k7=frxM3oyW@xxxxxxxxxxxxxx> <4D5A29C0.4050702@xxxxxxxxxxxxxx> <4D5B9D2B.107@xxxxxxxxxxxxxx> <AANLkTin+rE1=+vpmTg9xeQdYn7_hucSFkrz1qCtiKfkY@xxxxxxxxxxxxxx> <4D5BDAF8.50800@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20101226 Iceowl/1.0b1 Icedove/3.0.11
On 02/16/11 15:11, Juergen Gross wrote:
On 02/16/11 14:54, George Dunlap wrote:
Andre (and Juergen), can you try again with the attached patch?

What the patch basically does is try to make "cpu_disable_scheduler()"
do what it seems to say it does. :-) Namely, the various
scheduler-related interrutps (both per-cpu ticks and the master tick)
is a part of the scheduler, so disable them before doing anything, and
don't enable them until the cpu is really ready to go again.

To be precise:
* cpu_disable_scheduler() disables ticks
* scheduler_cpu_switch() only enables ticks if adding a cpu to a pool,
and does it after inserting the idle vcpu
* Modify semantics, s.t., {alloc,free}_pdata() don't actually start or
stop tickers
+ Call tick_{resume,suspend} in cpu_{up,down}, respectively

I tried this before :-)
It didn't work for Andre, but may be there were some bits missing.

* Modify credit1's tick_{suspend,resume} to handle the master ticker
as well.

With this patch (if dom0 doesn't get wedged due to all 8 vcpus being
on one pcpu), I can perform thousands of operations successfully.

Nice. I'll try later. In the moment I'm testing another patch (attached
for review, if you like). I think I've identified two possible races.

My patch works for me. I think I have to rework the locking for credit1, but
that shouldn't be too hard.

My machine survived 10000 iterations of your script with additional
consistency checks in the scheduler. Without my patch the machine crashed
after less then 500 iterations.


Juergen

--
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>