WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: [PATCH] blkfront: Move blkif_interrupt into a taskle

To: Daniel Stodden <daniel.stodden@xxxxxxxxxx>
Subject: Re: [Xen-devel] Re: [PATCH] blkfront: Move blkif_interrupt into a tasklet.
From: Andrew Jones <drjones@xxxxxxxxxx>
Date: Mon, 27 Sep 2010 12:21:13 +0200
Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>, Jeremy Fitzhardinge <jeremy@xxxxxxxx>, Xen <xen-devel@xxxxxxxxxxxxxxxxxxx>, Tom Kopec <tek@xxxxxxx>
Delivery-date: Mon, 27 Sep 2010 03:24:09 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1285580789.4365.620.camel@xxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <1282546470-5547-1-git-send-email-daniel.stodden@xxxxxxxxxx> <1282546470-5547-2-git-send-email-daniel.stodden@xxxxxxxxxx> <4C802934.2000305@xxxxxxxx> <4C9B7B69.7080705@xxxxxxxxxx> <4C9B7F1A.2040302@xxxxxxxx> <4C9B826B.10302@xxxxxxxxxx> <4C9B9E1D.2040501@xxxxxxxx> <4C9C4FDA.1070907@xxxxxxxxxx> <4C9CF2F6.2070806@xxxxxxxx> <4CA04AC1.4060902@xxxxxxxxxx> <1285580789.4365.620.camel@xxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.12) Gecko/20100907 Fedora/3.0.7-1.fc12 Lightning/1.0b2pre Thunderbird/3.0.7
On 09/27/2010 11:46 AM, Daniel Stodden wrote:
> On Mon, 2010-09-27 at 03:41 -0400, Andrew Jones wrote:
>> On 09/24/2010 08:50 PM, Jeremy Fitzhardinge wrote:
>>>  On 09/24/2010 12:14 AM, Andrew Jones wrote:
>>>> On 09/23/2010 08:36 PM, Jeremy Fitzhardinge wrote:
>>>>>  On 09/23/2010 09:38 AM, Paolo Bonzini wrote:
>>>>>> On 09/23/2010 06:23 PM, Jeremy Fitzhardinge wrote:
>>>>>>>> Any developments with this? I've got a report of the exact same
>>>>>>>> warnings
>>>>>>>> on RHEL6 guest. See
>>>>>>>>
>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=632802
>>>>>>>>
>>>>>>>> RHEL6 doesn't have the 'Move blkif_interrupt into a tasklet' patch, so
>>>>>>>> that can be ruled out. Unfortunately I don't have this reproducing on a
>>>>>>>> test machine, so it's difficult to debug.  The report I have showed
>>>>>>>> that
>>>>>>>> in at least one case it occurred on boot up, right after initting the
>>>>>>>> block device. I'm trying to get confirmation if that's always the case.
>>>>>>>>
>>>>>>>> Thanks in advance for any pointers you might have.
>>>>>>> Yes, I see it even after reverting that change as well.  However I only
>>>>>>> see it on my domain with an XFS filesystem, but I haven't dug any deeper
>>>>>>> to see if that's relevant.
>>>>>>>
>>>>>>> Do you know when this appeared?  Is it recent?  What changes are in the
>>>>>>> rhel6 kernel in question?
>>>>>> It's got pretty much everything in stable-2.6.32.x, up to the 16 patch
>>>>>> blkfront series you posted last July.  There are some RHEL-specific
>>>>>> workarounds for PV-on-HVM, but for PV domains everything matches
>>>>>> upstream.
>>>>> Have you tried bisecting to see when this particular problem appeared? 
>>>>> It looks to me like something is accidentally re-enabling interrupts -
>>>>> perhaps a stack overrun is corrupting the "flags" argument between a
>>>>> spin_lock_irqsave()/restore pair. 
>>>>>
>>>> Unfortunately I don't have a test machine where I can do a bisection
>>>> (yet). I'm looking for one. I only have this one report so far, and it's
>>>> on a production machine.
>>>
>>> The report says that its repeatedly killing the machine though?  In my
>>> testing, it seems to hit the warning once at boot, but is OK after that
>>> (not that I'm doing anything very stressful on the domain).
>>>
>>
>> It looks like the crash is from failing to read swap due to a bad page
>> map. It's possibly another issue, but I wanted to try and clean this
>> issue up first to see what happens.
> 
> Uh oh. Sure this was a frontend crash? If you see it a again, a stack
> trace to look at would be great.
> 

Hi Daniel,

You can take a look at this bug

https://bugzilla.redhat.com/show_bug.cgi?id=632802

there's stacks for the swap issue in the comments and also this attached
dmesg

https://bugzilla.redhat.com/attachment.cgi?id=447789


Thanks,
Drew



> Thanks,
> Daniel
> 
> 
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel