WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

To: Andre Przywara <andre.przywara@xxxxxxx>
Subject: Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
Date: Wed, 02 Feb 2011 09:49:59 +0100
Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Stephan Diestelhorst <stephan.diestelhorst@xxxxxxx>
Delivery-date: Wed, 02 Feb 2011 00:50:38 -0800
Dkim-signature: v=1; a=rsa-sha256; c=simple/simple; d=ts.fujitsu.com; i=juergen.gross@xxxxxxxxxxxxxx; q=dns/txt; s=s1536b; t=1296636602; x=1328172602; h=message-id:date:from:mime-version:to:cc:subject: references:in-reply-to:content-transfer-encoding; z=Message-ID:=20<4D491AB7.40204@xxxxxxxxxxxxxx>|Date:=20We d,=2002=20Feb=202011=2009:49:59=20+0100|From:=20Juergen =20Gross=20<juergen.gross@xxxxxxxxxxxxxx>|MIME-Version: =201.0|To:=20Andre=20Przywara=20<andre.przywara@xxxxxxx> |CC:=20George=20Dunlap=20<George.Dunlap@xxxxxxxxxxxxx>, =20=0D=0A=20Ian=20Jackson=20<Ian.Jackson@xxxxxxxxxxxxx>, =0D=0A=20"xen-devel@xxxxxxxxxxxxxxxxxxx"=20<xen-devel@lis ts.xensource.com>,=20=0D=0A=20Keir=20Fraser=20<keir@xxxxx rg>,=0D=0A=20Stephan=20Diestelhorst=20<stephan.diestelhor st@xxxxxxx>|Subject:=20Re:=20[Xen-devel]=20Hypervisor=20c rash(!)=20on=20xl=20cpupool-numa-split|References:=20<4D4 1FD3A.5090506@xxxxxxx>=09<4D426673.7020200@xxxxxxxxxxxxxx >=09<4D42A35D.3050507@xxxxxxx>=09<4D42AC00.8050109@xxxxxx itsu.com>=09<4D42C153.5050104@xxxxxxx>=09<4D465F0D.401040 8@xxxxxxxxxxxxxx>=09<4D46CE4F.3090003@xxxxxxx>=09<AANLkTi =3DppBtb1nhdfbhGZa0Rt6kVyopdS3iJPr5fVA1x@xxxxxxxxxxxxxx> =09<4D483599.1060807@xxxxxxx>=20<4D48F954.5000103@xxxxxxx tsu.com>|In-Reply-To:=20<4D48F954.5000103@xxxxxxxxxxxxxx> |Content-Transfer-Encoding:=207bit; bh=viQLyBrW90JdjRno4FPl4A9BFEdViX+aC/XqOStN2aA=; b=mjGlWEdvJNDvCdHO8ok1aoX1Zs1YzriNvfbp4LAukyNLhPFvPSACp7op flvOm9oOpNEYPM+bXLWeFWU9arEuDL6axwcfOV1D3IuFzziE4t7Uw/2jO CLK1HReBIPk0M7DCdBpW07NAKPXtUwZBkyctEYlNe4lpzXrX7+B22wdpa 7UkBbX9kDS6F/UbK++TIpJ1hRJmhOCnK3m5cJptrz/ZGC5WXJN1w+PosO VrJj7gVeCYLbKDN4zBWUWnM1foUGx;
Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding: Content-Transfer-Encoding; b=RE9L4sJve4bp5RI1CkvkYtquEsqK3Ic2i2l3N0hUxJ4PI/sUGwjpd4pF L6eUOJoXGK2QuYStwuy7hYiYC9KS9ae8XuhA7mn4l8XzG+0J5SY3DZU7C pxi6pY/D+oOGwmjNXMK0aOu+223EU7lf/2/L3uP1J42o2tjnVMQ7NW4F+ mN2UCOw7uf7Q+gan7mUnN4EnHSIIKgCE8bt9O4ya+DFuIK1Kd0XLseaqX YF6+Wyl7Aq+lB8M3Fyiy/5/gLu23q;
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4D48F954.5000103@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Fujitsu Technology Solutions
References: <4D41FD3A.5090506@xxxxxxx> <4D426673.7020200@xxxxxxxxxxxxxx> <4D42A35D.3050507@xxxxxxx> <4D42AC00.8050109@xxxxxxxxxxxxxx> <4D42C153.5050104@xxxxxxx> <4D465F0D.4010408@xxxxxxxxxxxxxx> <4D46CE4F.3090003@xxxxxxx> <AANLkTi=ppBtb1nhdfbhGZa0Rt6kVyopdS3iJPr5fVA1x@xxxxxxxxxxxxxx> <4D483599.1060807@xxxxxxx> <4D48F954.5000103@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20101226 Iceowl/1.0b1 Icedove/3.0.11
On 02/02/11 07:27, Juergen Gross wrote:
On 02/01/11 17:32, Andre Przywara wrote:
Hi folks,

I asked Stephan Diestelhorst for help and after I convinced him that
removing credit and making SEDF the default again is not an option he
worked together with me on that ;-) Many thanks for that!
We haven't come to a final solution but could gather some debug data.
I will simply dump some data here, maybe somebody has got a clue. We
will work further on this tomorrow.

First I replaced the BUG_ON with some printks to get some insight:
(XEN) sdom->active_vcpu_count: 18
(XEN) sdom->weight: 256
(XEN) weight_left: 4096, weight_total: 4096
(XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
(XEN) Xen BUG at sched_credit.c:591
(XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]----

So that one shows that the number of VCPUs is not up-to-date with the
computed weight sum, we have seen a difference of one or two VCPUs (in
this case here the weight has been computed from 16 VCPUs). Also it
shows that the assertion kicks in in the first iteration of the loop,
where weight_left and weight_total are still equal.

So I additionally instrumented alloc_pdata and free_pdata, the
unprefixed lines come from a shell script mimicking the functionality of
cpupool-numa-split.
------------
Removing CPUs from Pool 0
Creating new pool
Using config file "cpupool.test"
cpupool name: Pool-node6
scheduler: credit
number of cpus: 1
(XEN) adding CPU 36, now 1 CPUs
(XEN) removing CPU 36, remaining: 17
Populating new pool
(XEN) sdom->active_vcpu_count: 9
(XEN) sdom->weight: 256
(XEN) weight_left: 2048, weight_total: 2048
(XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
(XEN) adding CPU 37, now 2 CPUs
(XEN) removing CPU 37, remaining: 16
(XEN) adding CPU 38, now 3 CPUs
(XEN) removing CPU 38, remaining: 15
(XEN) adding CPU 39, now 4 CPUs
(XEN) removing CPU 39, remaining: 14
(XEN) adding CPU 40, now 5 CPUs
(XEN) removing CPU 40, remaining: 13
(XEN) sdom->active_vcpu_count: 17
(XEN) sdom->weight: 256
(XEN) weight_left: 4096, weight_total: 4096
(XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
(XEN) adding CPU 41, now 6 CPUs
(XEN) removing CPU 41, remaining: 12
...
Two thing startled me:
1) There is quite some between the "Removing CPUs" message from the
script and the actual HV printk showing it's done, why is that not
synchronous?

Removing cpus from Pool-0 requires no switching of the scheduler, so you
see no calls of alloc/free_pdata here.

 > Looking at the code it shows that
__csched_vcpu_acct_start() is eventually triggered by a timer, shouldn't
that be triggered synchronously by add/removal events?

The vcpus are not moved explicitly, they are migrated by the normal
scheduler mechanisms, same as for vcpu-pin.

2) It clearly shows that each CPU gets added to the new pool _before_ it
gets removed from the old one (Pool-0), isn't that violating the "only
one pool per CPU" rule? Even it that is fine for a short period of time,
maybe the timer kicks in in this very moment resulting in violated
invariants?

The sequence you are seeing seems to be okay. The alloc_pdata for the
new pool
is called before the free_pdata for the old pool.

And the timer is not relevant, as only the idle vcpu should be running
on the
moving cpu and the accounting stuff is never called during idle.

Uhh, this could be wrong!
The normal ticker doesn't call accounting in idle and it is stopped during
cpu move. The master_ticker is handled wrong, perhaps. I'll check this and
prepare a patch if necessary.


Juergen

--
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel