Xen project Mailing List

[Xen-devel] Notes for upcoming PCI emulation call

I’ll try to summarize current issues/difficulties in extending the PCIe passthrough support and possible ways to resolve these problems which were discussed in the mailing list so far. Possible options to extend PCI passthrough/emulation capabilities ----------------------------------------------------------------- There is an arising need to support PCIe-specific features for PCI passthrough. A lot of devices have PCIe Extended Capabilities above 100h offset. Even if we don’t want to support these capabilities in Xen right away, a proprietary driver for a passed through device might want to use these extended capabilities anyway -– Vendor-specific Extended Capability is a classic example, though the device driver may try to read other Extended Capabilities from its device’s conf space. Apart from supporting PCIe Extended Capabilities, another possible (and big) direction –- supporting PCIe-specific features in general like native PCIe hotplug, new PM facilities or forwarding AER events to a guest OS. This will require adding support for some cooperation between passed through and emulated devices in a PCIe hierarchy, for which major changes in emulated PCI bus architecture are needed. At the moment, all PCIe devices are passed through in legacy PCI mode in Xen. This means there is no support currently for PCIe-specific features like extended PCI config space via ECAM. Even providing support for PCIe Extended Capabilities alone requires some changes –- we need to 1. Emulate ECAM (MMIO-accesses to MMCONFIG area) to allow reading/writing PCIe extended configuration space 2. Present a PCIe-capable system for a guest OS. This can be achieved by adding QEMU Q35 emulation support to Xen (RFC patch series for this feature was sent). For ECAM, in a very simplest case, QEMU existing MMCONFIG emulation can be reused. However, there are at least two incompatibility problems which need solution. These are: - Multiple PCI device emulators feature, used by VGPU in XenServer - Emulating (a simplest) upstream PCIe hierarchy for passed through PCIe devices. The issue was described in details here: http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg03593.html Latter problem must be resolved properly by introducing emulated PCIe Root Ports for passed through devices. Basically this means we need to emulate PCI-PCI bridges with secondary bus used to place real passed through devices, ideally using function grouping for related devices like GPU and its HDAudio function. There are different approaches _who_ should emulate these PCI-PCI bridges. QEMU has support for emulated RPs and PCIe switches but we might want to remove that privilege from QEMU as emulating RPs/switches above _real_ passed through PCIe devices is a relatively system thing. Also, we need to consider future PCIe passthru extensions like handling PM events from passed through PCIe devices as these features assume some additional support in upstream PCIe hierarchy. So, we need to decide who will be controlling emulated Root Ports for passed through devices – either Xen or QEMU. For a number of reasons it will be beneficial to do it on Xen side while sticking to QEMU allows reusing existing functionality on the other hand. Now, regarding the multiple PCI device emulators. For multiple PCI device emulators a specific passed through device may be assigned to a separate device model (non-QEMU). At the low-level this will appear as more than one IOREQ server present –- most PCI devices will be still handled by QEMU, with some being assigned to another (device-specific) device model -– a distinct binary –- via same xc_hvm_map_pcidev_to_ioreq_server() call. Later, hvm_select_ioreq_server() will select a proper device model destination based on BDF location of the device and ioreqs will be sent to the chosen target. This works well for legacy CF8h/CFCh PCI conf accesses, but MMCONFIG support introduces some problems. First of all, MMCONFIG itself is a chipset-specific thing. Both registers which control it and the number of MMCONFIG ranges (ECAM-capable PCIe segments) may differ for different emulated machines. This means that some designated device model should control it according to the user-selected emulated machine. Device-specific device model doesn't know anything about the emulated machine. Secondly, in order to have all necessary information to forward ioreqs to the correct device model, Xen needs to know 1. MMCONFIG base address and size (ideally extendable to support multiple MMCONFIGs) 2. MMCONFIG layout, corresponding to the current map of the PCI bus. This layout may change anytime due to a PCI-PCI bridge re-initialization or hotplugging a device. There are different options how to pass this information to Xen. Xen may even control it itself in some solutions. MMCONFIG layout can be obtained passively, by simply observing map_pcidev_to_ioreq_server calls to determine and store all emulated PCI device BDF locations. Another thing to consider here is MMIO hole layout and its impact. For example, adding PCI-PCI bridges creates some complication as they will provide windows in IO/MMIO space which should be sized accordingly to the secondary PCI bus content. In some cases like hotplugging a PCIe device (which should belong to some RP or switch DP) existing bridge windows might be too small to provide space for a newly added device, triggering PCI-PCI bridge and BARs re-initialization (aka PCI resource rebalancing in Windows terms) in guest. This action may change the PCI bus layout which needs to be addressed somehow. Also, by utilizing ACPI _DSM method (not our case luckily as we don't provide it) Windows may invoke a complete PCI BARs/PCI-PCI bridge re-initialization unconditionally on system boot. Possible directions to make multiple PCI device emulators compatible with PCIe/MMCONFIG -------------------------------------------------------------------- I. “Notification” approach. In this case QEMU will continue to emulate PCIEXBAR and handle MMCONFIG accesses. But, upon encountering any changes in the PCIEXBAR value, QEMU will report this change to Xen via any suitable channel -– either a dedicated dmop, XenStore param or anything else. Xen will store this information and use it to select a proper IOREQ server destination for trapped MMCONFIG accesses. II. “Own chipset device model”. In this case Xen will emulate some chipset-specific devices himself. Of particular interest are MCH and ICH9. Both emulated Root Complex and Root Ports will belong to Xen, allowing implementing PCIe-specific features like AER reporting in any convenient way. Ideally, from QEMU side only a set of distinct PCIDevice’s will remain – storage, networking, etc. A dummy pci-host will be providing forwarding of IOREQ_TYPE_PCI_CONFIG-accesses for remaining PCIDevices. PCI bus layout seen by QEMU can be made different with the real layout seen by guest. Final result will look like a new, very reduced QEMU machine with dummy PCIBus/ISABus, perhaps even based on top of QEMU null machine. While this approach is beneficial in many ways, it will affect compatibility with QEMU very, very badly. For example, NVDIMM support patches from Intel rely on QEMU ACPI facilities which can become completely inoperational due to removing emulated NB+SB and their corresponding subtypes and properties. Multiple similar issues and breakages may arise in future, though QEMU PM/ACPI facilities is the main problem. Note that Xen already emulates some of PMBASE registers and PMBASE value itself is hardcoded (at B000h IIRC). Own PMBASE BAR emulation will allow to remove this limitation. III. “Transparent emulation”. In this case Xen will intercept only some known registers for chipset-specific devices emulated by QEMU. PCIEXBAR, PMBASE, possibly MMIO Hole-controlling registers and some others. A handler for this kind of registers can be selectively called before or after the corresponding DM emulation (on different stages of IOREQ processing) and should have freedom to specify whether the DM may see this read/write (otherwise it is handled internally). This will allow to provide own support for PCIEXBAR/MMCONFIG emulation while keeping compatibility with QEMU. Zero changes will be needed on QEMU side. Xen will detect the emulated chipset either passively or via sending IOREQ_TYPE_PCI_CONFIG to read VID/DID from the device model directly. NB/SB VID/DID values will be used to distinguish between different emulated machines and to setup correct handlers for chipset-specific registers. Due to the requirement for a PCIe device to cooperate with upstream PCIe hierarchy (at least to belong to some RP/switch), some changes for multiple PCI emulator support must be made no matter the chosen solution. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.