WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] XENBUS: Timeout connecting to device errors

To: <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] XENBUS: Timeout connecting to device errors
From: "Graham, Simon" <Simon.Graham@xxxxxxxxxxx>
Date: Mon, 4 Dec 2006 14:18:37 -0500
Delivery-date: Mon, 04 Dec 2006 11:18:58 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AccX2P+RAWqsSQbWQWeVn+J9Wy6xLw==
Thread-topic: XENBUS: Timeout connecting to device errors
We've been noticing a lot of these errors when booting VMs since we
moved to 3.0.3 - I've traced this to the hotplug scripts in Dom0 taking
>10s to run to completion and specifically the vif-bridge script taking
>=9s to plug the vif into the s/w bridge on occasion - was wondering if
anyone has any insight into why it might take this long.

I added some instrumentation to the scripts to log entry/exit from
xen-backend.agent and also lock contention (attached at the end of this)
and have the following observations:

1. Currently, the various script invocations are issued in parallel but
are serialized
   by a single global lock -- is it really necessary, for example, to
serialize vif
   and vbd hot plug processing in Dom0?

2. In most cases we've seen, this problem happens when the first VM is
started after
   re-installing a box. In the example below, the 'vif online'
processing started at
   2:21:53 and did not finish until 2:22:04

3. Clearly a hard coded timeout of 10s is less than perfect -- is there
no better way of knowing
   when the hotplug processing is done?

Thanks,
Simon

<dom0 /var/log/messages:>

Dec  4 02:21:53 gromit xen-hotplug: /etc/hotplug/xen-backend.agent:
xen-backend[20234]: Start vif: add
Dec  4 02:21:53 gromit xen-hotplug: /etc/hotplug/xen-backend.agent:
xen-backend[20234]: End vif: add
Dec  4 02:21:53 gromit xen-hotplug: /etc/hotplug/xen-backend.agent:
xen-backend[20240]: Start vif: online
Dec  4 02:21:53 gromit xen-hotplug: /etc/hotplug/xen-backend.agent:
xen-backend[20252]: Start vbd: add
Dec  4 02:21:53 gromit xen-hotplug: /etc/hotplug/xen-backend.agent: Lock
/var/run/xen-hotplug/xenbus_hotplug_global by 20252 - currently owned by
20240: /etc/hotplug/xen-backend.agent
Dec  4 02:21:54 gromit lvm[12123]: XenDom wallace1: state changed
stopped => paused
Dec  4 02:21:54 gromit sn2spine: start RESULT <?xml version="1.0" ?>
<result status='ok' code='200'> <guest
id="3f879d14-8c70-48af-ae02-88df3afad3cb"><name>wallace1</name><id>3f879
d14-8c70-48af-ae02-88df3afad3cb</id><system>gromit.sn.stratus.com</syste
m><state>starting</state><availability>failover</availability><mode>dupl
ex</mode><memory>256</memory><cpus>1</cpus><storage><volume
device="hda1" mountpoint="/" name="drbd0"/></storage></guest> </result>

Dec  4 02:21:54 gromit xen-hotplug: /etc/xen/scripts/vif-bridge: online
XENBUS_PATH=backend/vif/1/0
Dec  4 02:21:54 gromit kernel: device vif1.0 entered promiscuous mode
Dec  4 02:21:54 gromit xen-hotplug: /etc/xen/scripts/vif-bridge:
iptables -A FORWARD -m physdev --physdev-in vif1.0  -j ACCEPT failed. If
you are using iptables, this may affect networking for guest domains.
Dec  4 02:21:55 gromit kernel: xenbr0: port 3(vif1.0) entering learning
state
Dec  4 02:21:59 gromit kernel: xenbr0: topology change detected,
propagating
Dec  4 02:22:03 gromit kernel: xenbr0: port 3(vif1.0) entering
forwarding state
Dec  4 02:22:04 gromit lvm[12123]: XenDom wallace1: state changed paused
=> running
Dec  4 02:22:04 gromit xen-hotplug: /etc/hotplug/xen-backend.agent:
xen-backend[20240]: End vif: online
Dec  4 02:22:18 gromit kernel: ip_tables: (C) 2000-2006 Netfilter Core
Team
Dec  4 02:22:22 gromit xen-hotplug: /etc/xen/scripts/block: add
XENBUS_PATH=backend/vbd/1/769
Dec  4 02:22:26 gromit xen-hotplug: /etc/hotplug/xen-backend.agent:
xen-backend[20252]: End vbd: add
Dec  4 02:22:29 gromit lvm[12123]: XenDom wallace1: state changed
running => crashed


<guest console>:
Dec  4 02:21:54 Linux version 2.6.16.29-xenU
(sntriage@xxxxxxxxxxxxxxxxxxx) (gcc version 3.4.4 20050721 (Red Hat
3.4.4-2)) #1 SMP Mon Dec 4 01:33:25 EST 2006
...
Dec  4 02:22:04 XENBUS: Timeout connecting to device: device/vbd/769
(state 3)
Dec  4 02:22:04 Root-NFS: No NFS server available, giving up.
Dec  4 02:22:04 VFS: Unable to mount root fs via NFS, trying floppy.
Dec  4 02:22:04 VFS: Cannot open root device "hda1" or
unknown-block(2,0)
Dec  4 02:22:04 Please append a correct "root=" boot option
Dec  4 02:22:04 Kernel panic - not syncing: VFS: Unable to mount root fs
on unknown-block(2,0)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>