[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] IOMMU: improve the FLR logic and move it from hypervisor to Control Panel?

  • To: "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "Cui, Dexuan" <dexuan.cui@xxxxxxxxx>
  • Date: Thu, 19 Jun 2008 13:13:38 +0800
  • Delivery-date: Wed, 18 Jun 2008 22:14:09 -0700
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AcjRyzk1uEdlx8vPSFedd7rG+KIQSQ==
  • Thread-topic: IOMMU: improve the FLR logic and move it from hypervisor to Control Panel?

Currently, when creating/destroying hvm guest with assigned devices, we
perform FLR for the devices in hypervisor:
xen/drivers/passthrough/vtd/utils.c: pdev_flr(). 
The logic is:
a) if the device is PCI-e endpoint and it supports FLR, use that;
b) for other cases, we use D3hot/D0 transition for FLR.

There are some issues:

1) looks there are few PCIe devices supporting FLR now. So currently,
almost all the PCIe devices and all PCI devices use the D3hot/D0 method.
However, actually, Dstate transition is not guaranteed to  properly
clear the device state;

2) in case a), the current implementation is actually buggy:
Transaction_Pending_bit==0 doesn't mean the completion of FLR, just
means a way to ensure there is no pending transaction when we're going
to issue FLR (so we can be sure there is no data corruption). 
And according to PCIe spec, after issuing FLR, we should wait at least
100ms, but "mdelay(100)" is not acceptable in Xen...

To resolve the issues, I propose to change the FLR logic to:

1) If the device is PCIe endpoint and supports PCIe FLR, use that;
2) Else, if the device is PCIe endpoint, and all functions on the device
are assigned to the same guest, we use the immediate parent bus's
"Secondary Bus Reset" to reset all functions of the device (here,
actually we require all the functions of the device be assigned to the
same guest);
3) Else, if the device is PCI endpoint and is on a host bus (e.g.
integrated devices), and if the device supports PCI "Advanced
Capabilities", we use that for FLR;
4) Else, if the device is a vendor integrated PCI device with "known"
set of vendor/device id, we use the vendor-defined method of issuing
FLR. For instance, for the VendorID=0x8086, we can use the method
defined in Intel ICH9 Datasheet to perform FLR;
5) Else, we use the" Secondary Bus Reset" (we ensure all the PCI devices
behind a bridge must be assigned to the same guest).

And I propose to move the FLR logic to Control Panel. 
The benefits are: 
1) It's natural, and makes the hypervisor thin;
2) The 100ms-delay can be implemented easily in Control Panel, but not
easily in hypervisor;
3) Some logic, like the lookup of a device's BDF to its parent's BDF can
be done  more easily in Control Panel.

Comments are appreciated.

-- Dexuan

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.