WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] hanging tapdisk2 processes and multipathing

On Fri, 2011-07-22 at 06:01 -0400, Sébastien RICCIO wrote:
> > The processes, really? Where do they hang? (check out the wait state --
> > ps -eopid,wchan:25,cmd or so).
> >
> > Or do you mean they're stuck waiting for I/Os?
> >
> > Daniel
> >
> >
> 
> They seems to work and to do their job, but they are in a strange state. 
> For example a ps -aux on dom0 hangs when processing
> the line about the tapdisk process, also it cannot be detached from the 
> vm, and issuing a reboot of the host hangs too (can't kill the process 
> so it doesn't reboot).
> 
> I fighted quite a lot with this on a debian6 + xen 4.1.x  box and found 
> out that disabling the  multipath-tools and multipath-tools-boot 
> corrected the problem (but I need them). I thought that maybe it was 
> beacause multipathd try to "multipath" the block device
> handled by blktap2 and somehow locks it. But it's speculations :)

The multipathing is in a dm node to which tapdisk issues I/O. There's no
special handling involved in there whatsoever. It's completely
transparent, to blktap and tapdisk, as it should be.

I could imagine tapdisk wedging in dm code, during some I/O operations.
These should be fully asynchronous, but for some storage types under
special conditions that's sometimes wishful thinking. That applies if
you find a tap-ctl call (even just a list command) blocking.

The blktap module does not do anything unusual to the tapdisk task.

Anyway, it'd initially be a matter of figuring out where exactly it
blocks. If ps is borked, try to get another shell and
cat /proc/<pid>/wchan. Makes sense with both the ps and tapdisk2 tasks.

You say from the guest I/O perspective it still makes progress? If not,
that would explain why you're unable to detach: Blkback won't be able to
release the device before all pending I/O is flushed.

To check tapdev I/O state from the host side, do a
cat /sys/class/blktap2/tapdisk<n>/debug

That will dump some task stuff and a list of outstanding requests, if
there are any.

> I do not have the the hands on the box at the moment to give you more 
> informations and do not want to hijack this thread. It's just that it 
> looked like the problem I encountered, but I will send you more 
> informations when I am on the box.

Thanks!

Daniel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel