Xen project Mailing List

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

To: Alexey G <x1917x@xxxxxxxxx>

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Wed, 21 Mar 2018 09:09:11 +0000

Cc: Wei Liu <wei.liu2@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Paul Durrant <paul.durrant@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Wed, 21 Mar 2018 09:09:35 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Mar 21, 2018 at 10:58:40AM +1000, Alexey G wrote: > On Tue, 20 Mar 2018 08:50:48 +0000 > Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote: > > >On Tue, Mar 20, 2018 at 05:49:22AM +1000, Alexey G wrote: > >> On Mon, 19 Mar 2018 15:58:02 +0000 > >> Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote: > >> > >> >On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko > >> >wrote: > >> >> Much like normal PCI BARs or other chipset-specific memory-mapped > >> >> resources, MMCONFIG area needs space in MMIO hole, so we must > >> >> allocate it manually. > >> >> > >> >> The actual MMCONFIG size depends on a number of PCI buses > >> >> available which should be covered by ECAM. Possible options are > >> >> 64MB, 128MB and 256MB. As we are limited to the bus 0 currently, > >> >> thus using lowest possible setting (64MB), #defined via > >> >> PCI_MAX_MCFG_BUSES in hvmloader/config.h. When multiple PCI buses > >> >> support for Xen will be implemented, PCI_MAX_MCFG_BUSES may be > >> >> changed to calculation of the number of buses according to > >> >> results of the PCI devices enumeration. > >> >> > >> >> The way to allocate MMCONFIG range in MMIO hole is similar to how > >> >> other PCI BARs are allocated. The patch extends 'bars' structure > >> >> to make it universal for any arbitrary BAR type -- either IO, > >> >> MMIO, ROM or a chipset-specific resource. > >> > > >> >I'm not sure this is fully correct. The IOREQ interface can > >> >differentiate PCI devices and forward config space accesses to > >> >different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change > >> >you will forward all MCFG accesses to QEMU, which will likely be > >> >wrong if there are multiple PCI-device emulators for the same > >> >domain. > >> > > >> >Ie: AFAICT Xen needs to know about the MCFG emulation and detect > >> >accesses to it in order to forward them to the right emulators. > >> > > >> >Adding Paul who knows more about all this. > >> > >> In which use cases multiple PCI-device emulators are used for a > >> single HVM domain? Is it a proprietary setup? > > > >Likely. I think XenGT might be using it. It's a feature of the IOREQ > >implementation in Xen. > > According to public slides for the feature, both PCI conf and MMIO > accesses can be routed to the designated device model. It looks like > for this particular setup it doesn't really matter which particular > ioreq type must be used for MMCONFIG accesses -- either > IOREQ_TYPE_PCI_CONFIG or IOREQ_TYPE_COPY (MMIO accesses) should be > acceptable. Isn't that going to be quite messy? How is the IOREQ server supposed to decode a MCFG access received as IOREQ_TYPE_COPY? I don't think the IOREQ server needs to know the start of the MCFG region, in which case it won't be able to detect and decode the access if it's of type IOREQ_TYPE_COPY. MCFG accesses need to be sent to the IOREQ server as IOREQ_TYPE_PCI_CONFIG, or else you are forcing each IOREQ server to know the position of the MCFG area in order to do the decoding. In your case this would work because QEMU controls the position of the MCFG region, but there's no need for other IOREQ servers to know the position of the MCFG area. > The only thing which matters is ioreq routing itself -- > making decisions to which device model the PCI conf/MMIO ioreq should > be sent. Hm, see above, but I'm fairly sure you need to forward those MCFG accesses as IOREQ_TYPE_PCI_CONFIG to the IOREQ server. > >Traditional PCI config space accesses are not IO port space accesses. > > (assuming 'not' mistyped here) Not really, this should instead be: "Traditional PCI config space accesses are not forwarded to the IOREQ server as IO port space accesses (IOREQ_TYPE_PIO) but rather as PCI config space accesses (IOREQ_TYPE_PCI_CONFIG)." Sorry for the confusion. > >The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and IOREQ > >servers can register devices they would like to receive configuration > >space accesses for. QEMU is already making use of this, see for > > That's one of the reasons why current IOREQ_TYPE_PCI_CONFIG > implementation is a bit inconvenient for MMCONFIG MMIO accesses -- it's > too much CF8h/CFCh-centric in its implementation, might be painful to > change something in the code which was intended for CF8h/CFCh handling > (and not for MMIO processing). I'm not sure I follow. Do you mean that changes should be made to the ioreq struct in order to forward MCFG accesses using IOREQ_TYPE_PCI_CONFIG as it's type? > >example xen_map_pcidev in the QEMU code. > > > >By treating MCFG accesses as MMIO you are bypassing the IOREQ PCI > >layer, and thus a IOREQ server could register a PCI device and only > >receive PCI configuration accesses from the IO port space, while MCFG > >accesses would be forwarded somewhere else. > > It will be handled by IOREQ too, just using a different IOREQ type > (MMIO one). The basic question is why do we have to stick to PCI conf > space ioreqs for emulating MMIO accesses to MMCONFIG. Because other IOREQ servers don't need to know about the position/size of the MCFG area, and cannot register MMIO ranges that cover their device's PCI configuration space in the MCFG region. Not to mention that it would would be a terrible design flaw to force IOREQ servers to register PCI devices and MCFG areas belonging to those devices separately as MMIO in order to trap all possible PCI configuration space accesses. > >I think you need to make the IOREQ code aware of the MCFG area and > >XEN_DMOP_IO_RANGE_PCI needs to forward both IO space and MCFG accesses > >to the right IOREQ server. > > Right now there is no way to inform Xen where the emulated MMCONFIG > area is located in order to make this decision, based on the address > within MMCONFIG range. A new dmop/hypercall is needed (with args > similar to pci_mmcfg_reserved) along with its usage in QEMU. > > I'll try to summarize two different approaches to MMCONFIG > handling. For both approaches the final PCI config host interface for a > passed through device in QEMU will remain same as at the moment -- > xen_host_pci_* functions in /hw/xen. > > > Approach #1. Informing Xen about MMCONFIG area changes and letting Xen > to translate MMIO accesses to _PCI_CONFIG ioreqs: > > 1. QEMU will trap accesses to PCIEXBAR, calling Xen via dmop/hypercall > to let the latter know of any MMCONFIG area address/size/status changes > > 2. Xen will trap MMIO accesses to the current MMCONFIG location and > convert memory accesses into one or several _PCI_CONFIG ioreqs and send > them to a chosen device model > > 3. QEMU will receive _PCI_CONFIG ioreqs with SBDF and 12-bit offsets > inside which it needs to somehow pass to > pci_host_config_{read,write}_common() for emulation. It might require > few hacks to make the gears turn (due to QEMU pci conf read/write > model). > At the moment emulated CF8h/CFCh ports play a special role > in all this -- xen-hvm.c writes an AMD-style value to the > emulated CF8h port "so that the config space access will target the > correct device model" (quote). Not sure about this and why it's is > needed if Xen actually makes the decision to which DM the PCI conf > ioreq should be sent. > > One minor note: these new 'set_mmconfig_' dmops/hypercalls have to be > triggered inside the chipset-specific emulation code in QEMU (PCIEXBAR > handling in Q35 case). If there will be another machine which needs to > emulate MMCONFIG control differently -- we have no choice but to > insert these dmops/hypercalls into another chipset-specific emulation > code as well, eg. inside HECBASE emulation code. Maybe you could detect offsets >= 256 and replay them in QEMU like mmio accesses? Using the address_space_write or pcie_mmcfg_data_read/write functions? I have to admit my knowledge of QEMU is quite limited, so I'm not sure of the best way to handle this. Ideally we should find a way that doesn't involve having to modify each chipset to handle MCFG accesses from Xen. It would be nice to have some kind of interface inside of QEMU so all chipsets can register MCFG areas or modify them, but this is out of the scope of this work. Regardless of how this ends up being implemented inside of QEMU I think the above approach is the right one from an architectural PoV. AFAICT there are still some reserved bits in the ioreq struct that you could use to signal 'this is a MCFG PCI access' if required. > Approach #2. Handling MMCONFIG area inside QEMU using usual MMIO > emulation: > > 1. QEMU will trap accesses to PCIEXBAR (or whatever else possibly > supported in the future like HECBASE), eventually asking Xen to map the > MMCONFIG MMIO range for ioreq servicing just like it does for any > other emulated MMIO range, via map_io_range_to_ioreq_server(). All > changes in MMCONFIG placement/status will lead to remapping/unmapping > the MMIO range. > > 2. Xen will trap MMIO accesses to this area and forward them to QEMU as > MMIO (IOREQ_TYPE_COPY) ioreqs > > 3. QEMU will receive these accesses and pass them to the existing > MMCONFIG emulation -- pcie_mmcfg_data_read/write handlers, finally > resulting in same xen_host_pci_* function calls as before. > > This approach works "right out of the box", no changes needed for either > Xen or QEMU. As both _PCI_CONFIG and MMIO type ioreqs are processed, > either method can be used to access PCI/extended config space -- > CF8/CFC port I/O or MMIO accesses to MMCONFIG. > > IOREQ routing for multiple device emulators can be supported too. In > fact, the same mmconfig dmops/hypercalls can be added to let Xen know > where MMCONFIG area resides, Xen will use this information to forward > MMCONFIG MMIO ioreqs accordingly to BDF of the address. The difference > with the approach #1 is that these interfaces are now completely > optional when we use MMIO ioreqs for MMCONFIG on vanilla Xen/QEMU. As said above, if you forward MCFG accesses as IOREQ_TYPE_COPY you are forcing each IOREQ server to know the position of the MCFG area in order to do the decoding, this is not acceptable IMO. > The question is why IOREQ_TYPE_COPY -> IOREQ_TYPE_PCI_CONFIG > translation is a must have thing at all? It won't make handling simpler. > For current QEMU implementation IOREQ_TYPE_COPY (MMIO accesses for > MMCONFIG) would be preferable as it allows to use the existing code. Granted it's likely easier to implement, but it's also incorrect. You seem to have in mind the picture of a single IOREQ server (QEMU) handling all the devices. Although this is the most common scenario, it's not the only one supported by Xen. Your proposed solution breaks the usage of multiple IOREQ servers as PCI device emulators. > I think it will be safe to use MMCONFIG emulation on MMIO level for now > and later extend it with 'set_mmconfig_' dmop/hypercall for the > 'multiple device emulators' IOREQ_TYPE_COPY routing to work same as for > PCI conf, so it can be used by XenGT etc on Q35 as well. I'm afraid this kind of issues would have been fairly easier to identify if a design document for this feature was sent to the list prior to it's implementation. Regarding whether to accept something like this, I'm not really in favor, but IMO it depends on how much new code is added to handle this incorrect usage that would then go away (or would have to be changed) in order to handle the proper implementation. Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.