[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Second regression due to libxl: Remove linux udev rules (2ba368d13893402b2f1fb3c283ddcc714659dd9b)

El 30/07/15 a les 10.53, Roger Pau Monné ha escrit:
> El 28/07/15 a les 21.47, Konrad Rzeszutek Wilk ha escrit:
>> Hey,
>> I launch a bunch of guests at the same time or in parallel and 
>> the scripts end up timing out with:
>> Parsing config from //g-vm8.cfg
>> WARNING: you seem to be using "kernel" directive to override HVM guest 
>> firmware. Ignore that. Use "firmware_override" instead if you really want a 
>> non-default firmware
>> Jul 28 19:20:53 tst036 logger: /etc/xen/scripts/block: add 
>> XENBUS_PATH=backend/vbd/13/5632
>> libxl: error: libxl_aoutils.c:539:async_exec_timeout: killing execution of 
>> /etc/xen/scripts/block add because of timeout
>> libxl: error: libxl_create.c:1157:domcreate_launch_dm: unable to add disk 
>> devices
>> libxl: error: libxl_dm.c:1955:kill_device_model: unable to find device model 
>> pid in /local/domain/13/image/device-model-pid
>> libxl: error: libxl.c:1606:libxl__destroy_domid: libxl__destroy_device_model 
>> failed for 13
>> Jul 28 19:21:03 tst036 logger: /etc/xen/scripts/block: remove 
>> XENBUS_PATH=backend/vbd/13/5632
>> Jul 28 19:21:04 tst036 logger: /etc/xen/scripts/block: Writing 
>> backend/vbd/13/5632/hotplug-error xenstore-read backend/vbd/13/5632/node 
>> failed. backend/vbd/13/5632/hotplug-status error to xenstore.
>> Jul 28 19:21:04 tst036 logger: /etc/xen/scripts/block: xenstore-read 
>> backend/vbd/13/5632/node failed.
>> Jul 28 19:21:05 tst036 logger: /etc/xen/scripts/block: Writing 
>> backend/vbd/13/5632/hotplug-error /etc/xen/scripts/block failed; error 
>> detected. backend/vbd/13/5632/hotplug-status error to xenstore.
>> Jul 28 19:21:05 tst036 logger: /etc/xen/scripts/block: 
>> /etc/xen/scripts/block failed; error detected.
>> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: 
>> /etc/xen/scripts/block remove [10344] exited with error status 1
>> libxl: error: libxl_device.c:1085:device_hotplug_child_death_cb: script: 
>> /etc/xen/scripts/block failed; error detected.
>> libxl: error: libxl.c:1569:libxl__destroy_domid: non-existant domain 13
>> libxl: error: libxl.c:1527:domain_destroy_callback: unable to destroy guest 
>> with domid 13
>> libxl: error: libxl.c:1454:domain_destroy_cb: destruction of domain 13 failed
>> And I cannot start the guest.
>> While if I revert the mentioned commit everything works peachy.
>> What is interesting is that if I have the revert I can see that the
>> Jul 28 19:39:03 tst036 logger: /etc/xen/scripts/block: Writing 
>> backend/vbd/14/5632/physical-device 7:d to xenstore.
>> Jul 28 19:39:03 tst036 logger: /etc/xen/scripts/block: Writing 
>> backend/vbd/14/5632/hotplug-status connected to xenstore.
>> or often done much much later after xl create has started.
>> Attached is the bad log and the good log.
> Can you do the same test with xl -vvv and the following patch applied 
> (with and without 2ba368 reverted):


I've looked into this, and AFAICT you were probably using the udev 
rules (you have run_hotplug_scripts=0 in xl.conf?) before 2ba368, and 
now you are forcefully switched to launching hotplug scripts from libxl.

The issue is that you have multiple guests all using the same image 
file, so the time to execute the block hotplug script is O(n), where n 
is the number of times the same image is used:

shared_list=$(losetup -a |
      sed -n -e 
"s@^\([^:]\+\)\(:[[:blank:]]\[0*${dev}\]:${inode}[[:blank:]](.*)\)@\1@p" )
for dev in $shared_list
  if [ -n "$dev" ]
    check_file_sharing "$file" "$dev" "$mode"

This was not a problem when using udev, because there's no timeout, but 
libxl has a hard timeout (10s) regarding hotplug script execution. The 
only way I see to solve this is to remove the checks done in the block 
hotplug script, or to increase the timeout (but since the execution 
time is not bounded this is doomed to fail if enough guests are using 
the same image).


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.