[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Re: Crash on blktap shutdown



On 02/24/2010 07:03 PM, Daniel Stodden wrote:
On Wed, 2010-02-24 at 20:47 -0500, Daniel Stodden wrote:
On Wed, 2010-02-24 at 19:37 -0500, Jeremy Fitzhardinge wrote:
On 02/24/2010 04:29 PM, Daniel Stodden wrote:
On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote:

On 02/24/2010 03:49 PM, Daniel Stodden wrote:

On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote:


When rebooting the machine,  I got this crash from blktap.  The rip maps to 
line 262 in
0xffffffff812548a1 is in blktap_request_pool_free 
(/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).


Uhm, where did that RIP come from?

pool_free is on the module exit path. The stack trace below looks like a
crash from the broadcasted SIGTERM before reboot.


Ignore it; I generated it from a different kernel from the one that
crashed.  But the other oops I posted should be all consistent and
meaningful.

Ignore only the debuginfo quote, right?
Cos this looks like a different issue to me.

Perhaps.  I got all the others on normal domain shutdown, but this one
was on machine reboot.  I'll try to repro (as I boot the test kernel
with your patch in it).
(gdb) list *(blktap_device_restart+0x7a)
0x2a73 is in blktap_device_restart
(/local/exp/dns/scratch/xenbits/xen-unstable.hg/linux-2.6-pvops.git/drivers/xen/blktap/device.c:920).
915 /* Re-enable calldowns. */
916 if (blk_queue_stopped(dev->gd->queue))
917 blk_start_queue(dev->gd->queue);
918
919 /* Kick things off immediately. */
920 blktap_device_do_request(dev->gd->queue);
921
922 spin_unlock_irq(&dev->lock);
923 }
924

Assuming we've been dereferencing a NULL gendisk, i.e. device_destroy
racing against device_restart.

Would take

  * Tapdisk killed on the other thread, which goes through into
    a device_restart(). Which is what your stacktrace shows.

  * Device removal pending, blocking until
    device->users drops to 0, then doing the device_destroy().
    That might have happened during bdev .release.

Both running at the same time sounds like what happens if you kill them
all at once.

That clearly takes another patch then.
Jeremy,

can you try out the attached patch for me?

This should close the above shutdown race as well.

Should be nowhere as frequent as the timer_sync crash fixed earlier.

Hm, the two patches changed things but I'm still seeing problems on domain shutdown. Still looks like use-after-free.

blktap_device_destroy: destroy device 0 users 0
blktap_ring_vm_close: unmapping ring 0
blktap_ring_release: freeing device 0
blktap_sysfs_destroy
=============================================================================
BUG kmalloc-512: Poison overwritten
-----------------------------------------------------------------------------

INFO: 0xffff88002e9e2048-0xffff88002e9e2048. First byte 0x6a instead of 0x6b
INFO: Allocated in device_create_vargs+0x47/0xd7 age=7705 cpu=0 pid=3072
INFO: Freed in device_create_release+0x9/0xb age=14 cpu=0 pid=3320
INFO: Slab 0xffff880003cca5b0 objects=14 used=2 fp=0xffff88002e9e2000 flags=0xa3
INFO: Object 0xffff88002e9e2000 @offset=0 fp=0xffff88002e9e2248

  Object 0xffff88002e9e2000:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2010:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2020:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2030:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2040:  6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2050:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2060:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2070:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2080:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2090:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e20a0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e20b0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e20c0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e20d0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e20e0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e20f0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2100:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2110:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2120:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2130:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2140:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2150:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2160:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2170:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2180:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e2190:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e21a0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e21b0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e21c0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e21d0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e21e0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk
  Object 0xffff88002e9e21f0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kï
 Redzone 0xffff88002e9e2200:  bb bb bb bb bb bb bb bb                         ï
 Padding 0xffff88002e9e2240:  5a 5a 5a 5a 5a 5a 5a 5a                         Z
Pid: 3327, comm: ifdown Not tainted 2.6.32 #358
Call Trace:
 [<ffffffff810a83f9>] print_trailer+0x16a/0x173
 [<ffffffff810a89a0>] check_bytes_and_report+0xb5/0xe6
 [<ffffffff810a8a96>] check_object+0xc5/0x237
 [<ffffffff810aa588>] __slab_alloc+0x493/0x591
 [<ffffffff810e8fea>] ? load_elf_binary+0xe2/0x17d8
 [<ffffffff810e8fea>] ? load_elf_binary+0xe2/0x17d8
 [<ffffffff810ab06f>] __kmalloc+0xbe/0x12f
 [<ffffffff810e8fea>] load_elf_binary+0xe2/0x17d8
 [<ffffffff8100e921>] ? xen_force_evtchn_callback+0xd/0xf
 [<ffffffff8100e921>] ? xen_force_evtchn_callback+0xd/0xf
 [<ffffffff8100eff2>] ? check_events+0x12/0x20
 [<ffffffff810b3ee9>] ? search_binary_handler+0x18f/0x278
 [<ffffffff810e0208>] ? flock_to_posix_lock+0x4/0xe1
 [<ffffffff810b3e2c>] ? search_binary_handler+0xd2/0x278
 [<ffffffff8100efdf>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff81064f38>] ? lock_release+0x15a/0x166
 [<ffffffff810e0208>] ? flock_to_posix_lock+0x4/0xe1
 [<ffffffff810b3e39>] search_binary_handler+0xdf/0x278
 [<ffffffff810e8f08>] ? load_elf_binary+0x0/0x17d8
 [<ffffffff810b5453>] do_execve+0x185/0x27a
 [<ffffffff81010673>] sys_execve+0x3e/0x5c
 [<ffffffff8101209a>] stub_execve+0x6a/0xc0
FIX kmalloc-512: Restoring 0xffff88002e9e2048-0xffff88002e9e2048=0x6b


        J


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.