WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Kernel panic with 2.6.32-30 under network activity

To: Jan Beulich <JBeulich@xxxxxxxxxx>
Subject: Re: [Xen-devel] Kernel panic with 2.6.32-30 under network activity
From: Olivier Hanesse <olivier.hanesse@xxxxxxxxx>
Date: Thu, 17 Mar 2011 11:34:34 +0100
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Xen Users <xen-users@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 17 Mar 2011 03:35:11 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=98hMX7BLdMfzPD6Zvrg56VAg+0Gmjrbbga38SOeadw4=; b=gVmmTy4zSQtdigQzL+g0eHp3+ERVChER/9APRpP87WIa/vdL/sGWeiHRsKkDHb2XG6 9eiFIg88vxAtAmff+5nphwapFO6Ov8tkV6foXD5mnbmoJvIe6LndwRulhDJbi7F6A1SQ W9N91tepWsO4Izova+XMKq5ORJ/pdZFv8TM3s=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=woPSXBsYLxG5LGo41ZbZoGCmoXdHU7CdXBJKq+uj2EtwU91vl7JVa4YNH+Omc3gcwA nH+uiZe+tKdg2RreGDmaA175iVYOdGkBsXEhs4CjcR+buj/1fnReCMSczt88jeyOwtou l+rEnvrDZbMveCuB3qoVmPvCnSCaaElATIZ8A=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4D80A1940200007800036D07@xxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <AANLkTimVdAG6y+-9jNuQM78Bz+O7CuBteQdF1yK1YYCo@xxxxxxxxxxxxxx> <20110316032018.GC7905@xxxxxxxxxxxx> <4D8092240200007800036C9B@xxxxxxxxxxxxxxxxxx> <1300270268.17339.2417.camel@xxxxxxxxxxxxxxxxxxxxxx> <4D80A1940200007800036D07@xxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
It happens again a few minutes ago. It is the same kernel stack each time (alignment check: 0000 [#1] SMP etc ...)

The dom0 where all the faulty domU are running is a dual Xeon 5420 so 8 real cores available.
20 domUs are running on it, 35 vcpus are set up, is that too much ? The bug happens randomly on domUs
I was running the same config with xen3.2 without any issue.


It may be related, no issue with 2.6.24, and issue with 2.6.32.


2011/3/16 Jan Beulich <JBeulich@xxxxxxxxxx>
>>> On 16.03.11 at 11:11, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> On Wed, 2011-03-16 at 09:34 +0000, Jan Beulich wrote:
>> >>> On 16.03.11 at 04:20, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:
>> > On Thu, Mar 10, 2011 at 12:25:55PM +0100, Olivier Hanesse wrote:
>> >> [469390.126691] alignment check: 0000 [#1] SMP
>> >
>> > aligment check? Was there anything else in the log before this? Was there
>> > anything in the Dom0 log?
>>
>> This together with
>>
>> >> [469390.126795] RSP: e02b:ffff88001ec3f9b8  EFLAGS: 00050286
>>
>> makes me wonder if either eflags got restored from a corrupted
>> stack slot somewhere, or whether something in the kernel or one
>> of the modules intentionally played with EFLAGS.AC.
>
> Can a PV kernel running in ring-3 change AC?

Yes. We had this problem until we cleared the flag in
create_bounce_frame().

> The Intel manual says "They should not be modified by application
> programs" over a list including AC but the list also includes e.g. IOPL
> and IF so I suspect it meant "can not" rather than "should not"? In
> which case it can't happen by accident.

No, afaik "should not" is the correct term.

> The hypervisor appears to clear the guest's EFLAGS.AC on context switch
> to a guest and failsafe bounce but not in e.g. do_iret so it's not
> entirely clear what his policy is...

do_iret() isn't increasing privilege, and hence restoring whatever
the outer context of iret had in place is correct. The important
thing is that on the transition to kernel mode the flag must always
get cleared (which I think has been the case since the problem
in create_bounce_frame() was fixed).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel