WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: blktap race against xenstore startup

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] Re: blktap race against xenstore startup
From: Anthony Liguori <aliguori@xxxxxxxxxx>
Date: Thu, 28 Sep 2006 17:45:33 -0500
Cc: Andrew Warfield <andrew.warfield@xxxxxxxxxxxx>, Julian Chesterfield <jac90@xxxxxxxxx>, Steven Rostedt <rostedt@xxxxxxxxxxx>
Delivery-date: Thu, 28 Sep 2006 15:49:32 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <1159481874.8884.30.camel@xxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <1159481874.8884.30.camel@xxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 1.5.0.7 (X11/20060922)
Stephen C. Tweedie wrote:
Hi all,

With the various blktap fixes I've recently posted, blktap runs
reliably... the *second* time we start xend.  First time, blktapctrl
just dies on init.

It turns out that get_dom_domid() is SEGVing.  It calls

        e = xs_directory(h, xth, "/local/domain", &num);

and then iterates over the results to find the domain with the right
name (in this case, "Domain-0", which should be easy to find!)  Trouble
is, it's racing with xenstore startup, and when it calls this the first
time, it gets back an ENOENT (easily seen on an strace.)  That returns
e=NULL, and everything falls apart.

I have "fixed" it locally with the following terrible hack:

+       for (i = 0; i < 10; i++) {
+               e = xs_directory(h, xth, "/local/domain", &num);
+               if (e)
+                       break;
+               sleep(1);
+       }
        
-       e = xs_directory(h, xth, "/local/domain", &num);
-       
-       for (i = 0; (i < num) && (domid == NULL); i++) {
+       for (i = 0; e && (i < num) && (domid == NULL); i++) {

which just loops calling xs_directory() with a 1-second pause in between
until it returns something sensible.
Ugh.  There has got to be a better way to synchronise with the initial
population of the dom0 information into xenstore, surely?  Has no other
component of the Xen stack ever seen this before?

I don't know how blktap is launched right now, but the same problem has occurred in the past for other daemons (like xenconsoled).

xenstored won't close standard output until it's ready to receive connections. xend start will wait to start the other daemons until xenstored is ready. How does blktap get spawned?

Regards,

Anthony Liguori


--Stephen


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>