|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xl: pci completion error
Hello, Gianni!I was interested, how your patch would behave with many pci devices given to a domain. It seemed that each device removal takes 10 seconds for the timeout. pci=[ '01:00.0','00:1d.0', '00:1d.1', '00:1d.2','00:1d.3', '00:1d.7' ]The result was a bit different. After the first pci removal, the others somehow were destroyed. So the second do_pci_remove run into the failure that there were no pci devices at all. So I got the following log. Waiting for domain workxp (domid 1) to die [pid 3900] Domain 1 is dead Action for shutdown reason code 0 is destroy Domain 1 needs to be cleaned up: destroying the domain do_pci_remove device 01:00.0libxl: error: libxl_device.c:448:libxl__wait_for_device_model Device Model not ready libxl: error: libxl_pci.c:861:do_pci_remove Device Model didn't respond in time do_pci_remove device 00:1d.0libxl: error: libxl_pci.c:839:do_pci_remove PCI device not attached to this domain libxl: error: libxl.c:944:libxl_domain_destroy pci shutdown failed for domid 1 libxl: error: libxl.c:896:libxl_destroy_device_model Couldn't find device model's pid: No such file or directory libxl: error: libxl.c:956:libxl_domain_destroy libxl_destroy_device_model failed for 1 libxl: error: libxl_device.c:307:libxl__devices_destroy /local/domain/1/device is empty Sergey. Sergeys analysis sounds very plausible to me actually. The co-ordination required between qemu and libxl for PCI passthrough is very complicated and one ought to have fairly low confidence in it's correctness :P Below is a less hacky version of the patch I just sent. Stefano, please consider this for inclusion. --- xl: Implement PCI passthrough force removal This fixes two errors with removing PCI devices from HVM domains. The first error is that the handling of "pci-rem" device-model command is erroneously implemented in qemu and difficult (impossible?) to get right. For example, during domain shutdown there can be a race where the guest OS unloads it's drivers and perhaps even shuts down PCI subsystem before the pci-rem command has been received by qemu. This means that no OS is present to write to the port which causes the dm command to be acknowledged. We fix this by implementing a 'force removal' option to libxl_device_pci_remove which is always set to 1 during guest shutdown. It can be optionally enabled on the xl command line for other occasions. The second error is that if a guest OS doesn't respond to the SCI interrupt and therefore the pci-rem dm command, which can happen if the guest OS has no ACPI PCI hotplug support, then device removal bails with an error but only AFTER removing the device from xenstore. This means that xenstore gets in to an inconsistent state where an assigned device also appears to be assignable. This is fixed by moving xenstore device removal to occur only after the device has really been removed. Signed-off-by: Gianni Tedesco <gianni.tedesco@xxxxxxxxxx> diff -r 02e199c96ece tools/libxl/libxl.h --- a/tools/libxl/libxl.h Wed Oct 06 11:00:19 2010 +0100 +++ b/tools/libxl/libxl.h Wed Oct 06 15:29:12 2010 +0100 @@ -406,7 +406,7 @@ int libxl_device_vfb_clean_shutdown(libx int libxl_device_vfb_hard_shutdown(libxl_ctx *ctx, uint32_t domid);int libxl_device_pci_add(libxl_ctx *ctx, uint32_t domid, libxl_device_pci *pcidev); _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |