WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] can't create any more pv-on-hvm domains after~38under 3.

To: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] can't create any more pv-on-hvm domains after~38under 3.3-testing
From: Steve Ofsthun <sofsthun@xxxxxxxxxxxxxxx>
Date: Wed, 03 Dec 2008 12:02:29 -0500
Cc: James Harper <james.harper@xxxxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Wed, 03 Dec 2008 09:03:00 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C55C1F6F.77A%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C55C1F6F.77A%keir.fraser@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.18 (X11/20081112)
Keir Fraser wrote:
> On 03/12/2008 11:27, "James Harper" <james.harper@xxxxxxxxxxxxxxxx> wrote:
> 
>> Alternatively it could be a combination of the gplpv drivers and netback
>> or blkback. I'm pretty sure that I had the problem before I started
>> testing pvscsi...
>>
>> The machine I am testing on will be busy for the rest of the night, but
>> tomorrow I'll do some testing and see what happens, unless you can
>> suggest a way I could discover what those pages belong to in the
>> meantime?
> 
> Unfortunately it's a bit of a pain in the butt since we don't have full page
> tracking in Xen -- we only know that *someone* *somewhere* has that page
> mapped for *some* purpose. Indeed even with more tracking Xen can only
> really tell you which domain holds the reference, and that's bound to be
> dom0 (unless this is a bogus refcounting bug in Xen itself).

We have been investigating a similar sounding bug (hung pages with elevated 
reference counts) that occur when blkback requests are issued over an iSCSI 
backend device.  The block requests appear to be running afoul of the lazy copy 
optimization added for netback.  In this path, foreign pages (assumed to be 
netback pages?) are manipulated specially by the dma layer of the dom0 network 
stack.  On return to netback, the page refs are cleaned up.

In our case, the foreign pages actually originate from blkback, are passed to 
iSCSI for processing, and are abused by the ref manipulation in the dom0 
network stack.  On return to blkback, the page refs are off.  What we haven't 
been able to do yet, is identify the exact circumstances that trigger the 
issue.  We have a fairly elaborate reproducer involving running a pool of 
domains and continuously rebooting them.  Eventually, one domain will hang on 
exit with a stuck page with elevated ref counts.

In our case, the stuck page is always a blkback I/O page.

Running the same test on a FC SAN or local SCSI backend device doesn't hang.

- Steve

> I would suggest dumping addresses of interesting control pages in your
> backend drivers (some can log that already if built with debugging I think),
> then match up the address of the remaining page in the zombie domain.
> 
>  -- Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>