[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Basic blktap2 functionality issues.



On Mar 30, 12:23pm, Ian Campbell wrote:
} Subject: Re: [Xen-devel] Basic blktap2 functionality issues.

Hi, hope the weekend is going well for everyone.

Sorry for the delay in getting back to everyone on this, had a
deadline on another project.

> On Fri, 2012-03-30 at 09:17 +0100, Ian Campbell wrote:
> > I think an approach worth trying would be to have
> > tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before
> > doing the actual detach. i.e. it would respond with "Yes, I will do
> > that" rather than "Yes, I have done that". My speculation is that this
> > will allow libxl to continue and hopefully avoid the deadlock.

> This seems to be the case as the following fixes things for
> me. Thanks very much for your analysis which lead me to this
> solution...

I ported your fix into 4.1.2 but I think we still have a problem, at
least in this codebase.

I no longer see the select timeout delay when xl shuts down but upon
shutdown the minor number is not freed.  A 'tap-ctl list' shows a
steadily increasing set of orphaned minor numbers as VM's are started
up and shutdown.

Are you seeing this in your development codebase?

The culprit is a failed ioctl call for BLKTAP2_IOCTL_FREE_TAP in
tap_ctl_free().  The underlying reason for the ioctl failure is the
check in [linuxsrc]:drivers/block/blktap/ring.c:blktap_ring_destroy()
for whether or not the task_struct pointer in the blktap_ring
structure has been NULLed.

Which certainly makes sense since there is a race between xl's call to
tap_ctl_free() and tapdisk2 getting to the point where it can close
its descriptor to the blktap instance and thus invoke the .release
method which translates into a call to blktap_ring_release() which
NULL's the task_struct pointer.

If you are not seeing the orphan minor numbers there must be ordering
changes in the unstable version of xl which eliminate or alters the
race timing.

For the sake of completeness of information for this thread I captured
the following stack trace of a tapdisk2 when it is deadlocked against
xl:

---------------------------------------------------------------------------
Apr  1 07:03:45 hooter kernel: Call Trace:
Apr  1 07:03:45 hooter kernel:  [<c10aa791>] ? blkdev_get_blocks+0xb4/0xb4
Apr  1 07:03:45 hooter kernel:  [<c109ab11>] ? iput+0x28/0x143
Apr  1 07:03:45 hooter kernel:  [<c10e9fec>] ? blk_peek_request+0x155/0x165
Apr  1 07:03:45 hooter kernel:  [<c128279c>] schedule+0x4d/0x4f
Apr  1 07:03:45 hooter kernel:  [<f7a9053b>] 
blktap_device_destroy_sync+0x63/0x76 [blktap]
Apr  1 07:03:45 hooter kernel:  [<c104204a>] ? wake_up_bit+0x61/0x61
Apr  1 07:03:45 hooter kernel:  [<f7a8f57a>] blktap_ring_release+0xe/0x29 
[blktap]
Apr  1 07:03:45 hooter kernel:  [<c108aba7>] fput+0xce/0x167
Apr  1 07:03:45 hooter kernel:  [<c107944a>] remove_vma+0x28/0x47
Apr  1 07:03:45 hooter kernel:  [<c107a19a>] do_munmap+0x1e8/0x204
Apr  1 07:03:45 hooter kernel:  [<c107a1de>] sys_munmap+0x28/0x37
Apr  1 07:03:45 hooter kernel:  [<c1284adc>] sysenter_do_call+0x12/0x2c
Apr  1 07:03:45 hooter kernel:  [<c1280000>] ? migration_call+0x1d9/0x1f2
Apr  1 07:03:45 hooter kernel:  c686dad0 00000286 c1045bcd e86581c0 e84f6380 
c13d1c80 c13d1c80 d753476c
Apr  1 07:03:45 hooter kernel:  e9185b00 e9185c78 00000000 d75346cb 00002ad2 
d75346cb e8515ae4 e8515ae4
Apr  1 07:03:45 hooter kernel:  c1005597 00000000 ed7d80c8 c686dae4 c686da98 
c1005d10 ed7d02c0 00000000
---------------------------------------------------------------------------

Let me know if you are seeing the issues I'm seeing, in the meantime I
will keep hunting to see if I can rundown the ultimate cause of the
deadlock.  Given the above trace it has to be an issue with xl
orchestrating the release of resources which reference the tapdev
block device.

> Ian.

Will look forward to your thoughts.

Have a good weekend.

}-- End of excerpt from Ian Campbell

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@xxxxxxxxxxxx
------------------------------------------------------------------------------
"According to the philosopher Ly Tin Wheedle, chaos is found in greatest
 abundance wherever order is being sought.  It always defeats order,
 because it is better organized."
                                -- Terry Pratchett
                                   Interesting Times

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.