WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] Re: Crash on blktap shutdown

On Thu, 2010-02-25 at 18:18 -0500, Jeremy Fitzhardinge wrote:
> On 02/24/2010 07:03 PM, Daniel Stodden wrote:
> > On Wed, 2010-02-24 at 20:47 -0500, Daniel Stodden wrote:
> >    
> >> On Wed, 2010-02-24 at 19:37 -0500, Jeremy Fitzhardinge wrote:
> >>      
> >>> On 02/24/2010 04:29 PM, Daniel Stodden wrote:
> >>>        
> >>>> On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote:
> >>>>
> >>>>          
> >>>>> On 02/24/2010 03:49 PM, Daniel Stodden wrote:
> >>>>>
> >>>>>            
> >>>>>> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote:
> >>>>>>
> >>>>>>
> >>>>>>              
> >>>>>>> When rebooting the machine,  I got this crash from blktap.  The rip 
> >>>>>>> maps to line 262 in
> >>>>>>> 0xffffffff812548a1 is in blktap_request_pool_free 
> >>>>>>> (/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).
> >>>>>>>
> >>>>>>>
> >>>>>>>                
> >>>>>> Uhm, where did that RIP come from?
> >>>>>>
> >>>>>> pool_free is on the module exit path. The stack trace below looks like 
> >>>>>> a
> >>>>>> crash from the broadcasted SIGTERM before reboot.
> >>>>>>
> >>>>>>
> >>>>>>              
> >>>>> Ignore it; I generated it from a different kernel from the one that
> >>>>> crashed.  But the other oops I posted should be all consistent and
> >>>>> meaningful.
> >>>>>
> >>>>>            
> >>>> Ignore only the debuginfo quote, right?
> >>>> Cos this looks like a different issue to me.
> >>>>
> >>>>          
> >>> Perhaps.  I got all the others on normal domain shutdown, but this one
> >>> was on machine reboot.  I'll try to repro (as I boot the test kernel
> >>> with your patch in it).
> >>>        
> >> (gdb) list *(blktap_device_restart+0x7a)
> >> 0x2a73 is in blktap_device_restart
> >> (/local/exp/dns/scratch/xenbits/xen-unstable.hg/linux-2.6-pvops.git/drivers/xen/blktap/device.c:920).
> >> 915 /* Re-enable calldowns. */
> >> 916 if (blk_queue_stopped(dev->gd->queue))
> >> 917 blk_start_queue(dev->gd->queue);
> >> 918
> >> 919 /* Kick things off immediately. */
> >> 920 blktap_device_do_request(dev->gd->queue);
> >> 921
> >> 922 spin_unlock_irq(&dev->lock);
> >> 923 }
> >> 924
> >>
> >> Assuming we've been dereferencing a NULL gendisk, i.e. device_destroy
> >> racing against device_restart.
> >>
> >> Would take
> >>
> >>   * Tapdisk killed on the other thread, which goes through into
> >>     a device_restart(). Which is what your stacktrace shows.
> >>
> >>   * Device removal pending, blocking until
> >>     device->users drops to 0, then doing the device_destroy().
> >>     That might have happened during bdev .release.
> >>
> >> Both running at the same time sounds like what happens if you kill them
> >> all at once.
> >>
> >> That clearly takes another patch then.
> >>      
> > Jeremy,
> >
> > can you try out the attached patch for me?
> >
> > This should close the above shutdown race as well.
> >
> > Should be nowhere as frequent as the timer_sync crash fixed earlier.
> >    
> 
> Hm, the two patches changed things but I'm still seeing problems on 
> domain shutdown.  Still looks like use-after-free.

All these new-fashioned debug switches. Only causing trouble.

This is yet a different piece. The sysfs code was causing a double unref
on the ring device.

Daniel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>