Xen project Mailing List

Re: [Xen-devel] [PATCH] Re: Crash on blktap shutdown

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>

From: Daniel Stodden <daniel.stodden@xxxxxxxxxx>

Date: Fri, 26 Feb 2010 07:38:20 -0800

Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Jake Wires <Jake.Wires@xxxxxxxxxx>

Delivery-date: Fri, 26 Feb 2010 07:41:45 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Thu, 2010-02-25 at 18:18 -0500, Jeremy Fitzhardinge wrote: > On 02/24/2010 07:03 PM, Daniel Stodden wrote: > > On Wed, 2010-02-24 at 20:47 -0500, Daniel Stodden wrote: > > > >> On Wed, 2010-02-24 at 19:37 -0500, Jeremy Fitzhardinge wrote: > >> > >>> On 02/24/2010 04:29 PM, Daniel Stodden wrote: > >>> > >>>> On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote: > >>>> > >>>> > >>>>> On 02/24/2010 03:49 PM, Daniel Stodden wrote: > >>>>> > >>>>> > >>>>>> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> When rebooting the machine, I got this crash from blktap. The rip > >>>>>>> maps to line 262 in > >>>>>>> 0xffffffff812548a1 is in blktap_request_pool_free > >>>>>>> (/home/jeremy/git/linux/drivers/xen/blktap/request.c:262). > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> Uhm, where did that RIP come from? > >>>>>> > >>>>>> pool_free is on the module exit path. The stack trace below looks like > >>>>>> a > >>>>>> crash from the broadcasted SIGTERM before reboot. > >>>>>> > >>>>>> > >>>>>> > >>>>> Ignore it; I generated it from a different kernel from the one that > >>>>> crashed. But the other oops I posted should be all consistent and > >>>>> meaningful. > >>>>> > >>>>> > >>>> Ignore only the debuginfo quote, right? > >>>> Cos this looks like a different issue to me. > >>>> > >>>> > >>> Perhaps. I got all the others on normal domain shutdown, but this one > >>> was on machine reboot. I'll try to repro (as I boot the test kernel > >>> with your patch in it). > >>> > >> (gdb) list *(blktap_device_restart+0x7a) > >> 0x2a73 is in blktap_device_restart > >> (/local/exp/dns/scratch/xenbits/xen-unstable.hg/linux-2.6-pvops.git/drivers/xen/blktap/device.c:920). > >> 915 /* Re-enable calldowns. */ > >> 916 if (blk_queue_stopped(dev->gd->queue)) > >> 917 blk_start_queue(dev->gd->queue); > >> 918 > >> 919 /* Kick things off immediately. */ > >> 920 blktap_device_do_request(dev->gd->queue); > >> 921 > >> 922 spin_unlock_irq(&dev->lock); > >> 923 } > >> 924 > >> > >> Assuming we've been dereferencing a NULL gendisk, i.e. device_destroy > >> racing against device_restart. > >> > >> Would take > >> > >> * Tapdisk killed on the other thread, which goes through into > >> a device_restart(). Which is what your stacktrace shows. > >> > >> * Device removal pending, blocking until > >> device->users drops to 0, then doing the device_destroy(). > >> That might have happened during bdev .release. > >> > >> Both running at the same time sounds like what happens if you kill them > >> all at once. > >> > >> That clearly takes another patch then. > >> > > Jeremy, > > > > can you try out the attached patch for me? > > > > This should close the above shutdown race as well. > > > > Should be nowhere as frequent as the timer_sync crash fixed earlier. > > > > Hm, the two patches changed things but I'm still seeing problems on > domain shutdown. Still looks like use-after-free. All these new-fashioned debug switches. Only causing trouble. This is yet a different piece. The sysfs code was causing a double unref on the ring device. Daniel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.