WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: [PATCH] blkfront: Move blkif_interrupt into a taskle

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: Re: [Xen-devel] Re: [PATCH] blkfront: Move blkif_interrupt into a tasklet.
From: Andrew Jones <drjones@xxxxxxxxxx>
Date: Mon, 27 Sep 2010 09:41:53 +0200
Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>, Xen <xen-devel@xxxxxxxxxxxxxxxxxxx>, Tom Kopec <tek@xxxxxxx>, Daniel Stodden <daniel.stodden@xxxxxxxxxx>
Delivery-date: Mon, 27 Sep 2010 00:42:47 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4C9CF2F6.2070806@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <1282546470-5547-1-git-send-email-daniel.stodden@xxxxxxxxxx> <1282546470-5547-2-git-send-email-daniel.stodden@xxxxxxxxxx> <4C802934.2000305@xxxxxxxx> <4C9B7B69.7080705@xxxxxxxxxx> <4C9B7F1A.2040302@xxxxxxxx> <4C9B826B.10302@xxxxxxxxxx> <4C9B9E1D.2040501@xxxxxxxx> <4C9C4FDA.1070907@xxxxxxxxxx> <4C9CF2F6.2070806@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.12) Gecko/20100907 Fedora/3.0.7-1.fc12 Lightning/1.0b2pre Thunderbird/3.0.7
On 09/24/2010 08:50 PM, Jeremy Fitzhardinge wrote:
>  On 09/24/2010 12:14 AM, Andrew Jones wrote:
>> On 09/23/2010 08:36 PM, Jeremy Fitzhardinge wrote:
>>>  On 09/23/2010 09:38 AM, Paolo Bonzini wrote:
>>>> On 09/23/2010 06:23 PM, Jeremy Fitzhardinge wrote:
>>>>>> Any developments with this? I've got a report of the exact same
>>>>>> warnings
>>>>>> on RHEL6 guest. See
>>>>>>
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=632802
>>>>>>
>>>>>> RHEL6 doesn't have the 'Move blkif_interrupt into a tasklet' patch, so
>>>>>> that can be ruled out. Unfortunately I don't have this reproducing on a
>>>>>> test machine, so it's difficult to debug.  The report I have showed
>>>>>> that
>>>>>> in at least one case it occurred on boot up, right after initting the
>>>>>> block device. I'm trying to get confirmation if that's always the case.
>>>>>>
>>>>>> Thanks in advance for any pointers you might have.
>>>>> Yes, I see it even after reverting that change as well.  However I only
>>>>> see it on my domain with an XFS filesystem, but I haven't dug any deeper
>>>>> to see if that's relevant.
>>>>>
>>>>> Do you know when this appeared?  Is it recent?  What changes are in the
>>>>> rhel6 kernel in question?
>>>> It's got pretty much everything in stable-2.6.32.x, up to the 16 patch
>>>> blkfront series you posted last July.  There are some RHEL-specific
>>>> workarounds for PV-on-HVM, but for PV domains everything matches
>>>> upstream.
>>> Have you tried bisecting to see when this particular problem appeared? 
>>> It looks to me like something is accidentally re-enabling interrupts -
>>> perhaps a stack overrun is corrupting the "flags" argument between a
>>> spin_lock_irqsave()/restore pair. 
>>>
>> Unfortunately I don't have a test machine where I can do a bisection
>> (yet). I'm looking for one. I only have this one report so far, and it's
>> on a production machine.
> 
> The report says that its repeatedly killing the machine though?  In my
> testing, it seems to hit the warning once at boot, but is OK after that
> (not that I'm doing anything very stressful on the domain).
> 

It looks like the crash is from failing to read swap due to a bad page
map. It's possibly another issue, but I wanted to try and clean this
issue up first to see what happens.

>>> Is it only on 32-bit kernels?
>>>
>> This one report I have is a 32b guest on a 64b host.
> 
> Is it using XFS by any chance?  So far I've traced the re-enable to
> xfs_buf_bio_end_io().  However, my suspicion is that it might be related
> to the barrier changes we did.
> 

I'll check on the xfs and let you know.

>     J
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel