WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

To: Andre Przywara <andre.przywara@xxxxxxx>
Subject: Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
Date: Fri, 28 Jan 2011 12:44:00 +0100
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Delivery-date: Fri, 28 Jan 2011 03:45:45 -0800
Dkim-signature: v=1; a=rsa-sha256; c=simple/simple; d=ts.fujitsu.com; i=juergen.gross@xxxxxxxxxxxxxx; q=dns/txt; s=s1536b; t=1296215044; x=1327751044; h=message-id:date:from:mime-version:to:cc:subject: references:in-reply-to:content-transfer-encoding; z=Message-ID:=20<4D42AC00.8050109@xxxxxxxxxxxxxx>|Date:=20 Fri,=2028=20Jan=202011=2012:44:00=20+0100|From:=20Juergen =20Gross=20<juergen.gross@xxxxxxxxxxxxxx>|MIME-Version: =201.0|To:=20Andre=20Przywara=20<andre.przywara@xxxxxxx> |CC:=20"xen-devel@xxxxxxxxxxxxxxxxxxx"=20<xen-devel@lists .xensource.com>,=20=0D=0A=20Ian=20Jackson=20<Ian.Jackson@ eu.citrix.com>,=0D=0A=20Keir=20Fraser=20<keir.fraser@xxxx itrix.com>|Subject:=20Re:=20[Xen-devel]=20Hypervisor=20cr ash(!)=20on=20xl=20cpupool-numa-split|References:=20<4D41 FD3A.5090506@xxxxxxx>=20<4D426673.7020200@xxxxxxxxxxxxxx> =20<4D42A35D.3050507@xxxxxxx>|In-Reply-To:=20<4D42A35D.30 50507@xxxxxxx>|Content-Transfer-Encoding:=207bit; bh=MbMET5Hpj+Fms7MH0PticFoy2Ulsac7QPuGD1Gc11ls=; b=ddkKdBOmiUna/BJE093+v2iHIIV3apZYBXzYITcSRXpDAslWNLiZO9vZ UC31esNLVAkTPwfBoDG/AfbG3UgDIshyuPrxil1SApLS2i/vWMRSnYGcq r/r1l6S0SfBR+rrRUFhTfzVncEIsY9BgxKN61DauIR4cixh3g57vqMAea mTKsfzoEfawNYqGUzN4JhT/z87QcIGtLNOyulJb4Fr9A42bk4VCtpYeXo LcoceijcMBd7ax1ZV6dQrfVLBmdgE;
Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=wNK8SVJA/JGit5yUpdhW+J4U/lJodhEMM4fffT5LjUDSpKnQKMuuCbbk eXs8+JhsZ6TRNBxD/+EFQ1/qMfWUiHAj+6zXAJucpFUuPh742O7Sbok7e 2n/QPjE98vPwIzzwYSscYY6oZyv31vbY6y9KCMdwT66hC6j4MbJVAmB1C Bqm/FSKj7tfcrwdZKz4uYby4wwwv+4AHz15aH5+XguBt3izF6Z90V0cpP D2wgIC+IKajnZ8IN1gpMG3xmk5sRB;
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4D42A35D.3050507@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Fujitsu Technology Solutions
References: <4D41FD3A.5090506@xxxxxxx> <4D426673.7020200@xxxxxxxxxxxxxx> <4D42A35D.3050507@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20101226 Iceowl/1.0b1 Icedove/3.0.11
On 01/28/11 12:07, Andre Przywara wrote:
Juergen Gross wrote:
On 01/28/11 00:18, Andre Przywara wrote:
Hi,

when I boot my machine without restricting Dom0 (dom0_mem=
dom0_max_vcpus=) I get an _hypervisor_ crash when I run
# xl cpupool-numa-split
If Dom0's resources are limited on the Xen cmdline, everything works
fine.
The crashdump points to a scheduling problem with weights, so I assume
the NUMA distribution algorithm some fools the hypervisor completely.

I will investigate this further tomorrow, but maybe someone has some
good idea.

I've seen this once with an older cpupool version on a 24 processor
machine.
It was NOT related to NUMA, but did occur only on reboot after a Dom0
panic.
The machine had an init script creating a cpupool and populating it with
cpus. The machine was in a panic loop due to the BUG in sched_acct
then until
it was resetted manually. After the reset the problem was gone.

As I was never able to reproduce the problem later (the same software is
running on dozens of machines!), I assumed there was a problem related to
the first Dom0 panic, may be some destroyed BIOS tables.

Can the crash be reproduced easily?
Yes.
If I don't specify dom0_max_vcpus= and dom0_mem= on the Xen cmdline, I
can reliably trigger the crash with xl cpupool-numa-split.
Omitting dom0_max_vcpus only does not suffice.

Do I understand correctly?
No crash with only dom0_max_vcpus= and no crash with only dom0_mem= ?

Could you try this patch?

diff -r b59f04eb8978 xen/common/schedule.c
--- a/xen/common/schedule.c     Fri Jan 21 18:06:23 2011 +0000
+++ b/xen/common/schedule.c     Fri Jan 28 12:42:46 2011 +0100
@@ -1301,7 +1301,9 @@ void schedule_cpu_switch(unsigned int cp

     idle = idle_vcpu[cpu];
     ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
+    BUG_ON(ppriv == NULL);
     vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
+    BUG_ON(vpriv == NULL);

     pcpu_schedule_lock_irqsave(cpu, flags);



--
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel