[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring



On Tue, Mar 20, 2018 at 05:49:22AM +1000, Alexey G wrote:
> On Mon, 19 Mar 2018 15:58:02 +0000
> Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
> 
> >On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko wrote:
> >> Much like normal PCI BARs or other chipset-specific memory-mapped
> >> resources, MMCONFIG area needs space in MMIO hole, so we must
> >> allocate it manually.
> >> 
> >> The actual MMCONFIG size depends on a number of PCI buses available
> >> which should be covered by ECAM. Possible options are 64MB, 128MB
> >> and 256MB. As we are limited to the bus 0 currently, thus using
> >> lowest possible setting (64MB), #defined via PCI_MAX_MCFG_BUSES in
> >> hvmloader/config.h. When multiple PCI buses support for Xen will be
> >> implemented, PCI_MAX_MCFG_BUSES may be changed to calculation of the
> >> number of buses according to results of the PCI devices enumeration.
> >> 
> >> The way to allocate MMCONFIG range in MMIO hole is similar to how
> >> other PCI BARs are allocated. The patch extends 'bars' structure to
> >> make it universal for any arbitrary BAR type -- either IO, MMIO, ROM
> >> or a chipset-specific resource.  
> >
> >I'm not sure this is fully correct. The IOREQ interface can
> >differentiate PCI devices and forward config space accesses to
> >different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change you
> >will forward all MCFG accesses to QEMU, which will likely be wrong if
> >there are multiple PCI-device emulators for the same domain.
> >
> >Ie: AFAICT Xen needs to know about the MCFG emulation and detect
> >accesses to it in order to forward them to the right emulators.
> >
> >Adding Paul who knows more about all this.
> 
> In which use cases multiple PCI-device emulators are used for a single
> HVM domain? Is it a proprietary setup?

Likely. I think XenGT might be using it. It's a feature of the IOREQ
implementation in Xen.

Traditional PCI config space accesses are not IO port space accesses.
The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and IOREQ
servers can register devices they would like to receive configuration
space accesses for. QEMU is already making use of this, see for
example xen_map_pcidev in the QEMU code.

By treating MCFG accesses as MMIO you are bypassing the IOREQ PCI
layer, and thus a IOREQ server could register a PCI device and only
receive PCI configuration accesses from the IO port space, while MCFG
accesses would be forwarded somewhere else.

I think you need to make the IOREQ code aware of the MCFG area and
XEN_DMOP_IO_RANGE_PCI needs to forward both IO space and MCFG accesses
to the right IOREQ server.

> I assume it is somehow related to this code in xen-hvm.c:
>                 /* Fake a write to port 0xCF8 so that
>                  * the config space access will target the
>                  * correct device model.
>                  */
>                 val = (1u << 31) | ((req->addr & 0x0f00) <...>
>                 do_outp(0xcf8, 4, val);
> if yes, similar thing can be made for IOREQ_TYPE_COPY accesses to
> the emulated MMCONFIG if needed.

I have to admit I don't know that much about QEMU, and I have no idea
what the chunk above is supposed to accomplish.

> 
> In HVM+QEMU case we are not limited to merely passed through devices,
> most of the observable PCI config space devices belong to one particular
> QEMU instance. This dictates the overall emulated MMCONFIG layout
> for a domain which should be in sync to what QEMU emulates via CF8h/CFCh
> accesses... and between multiple device model instances (if there are
> any, still not sure what multiple PCI-device emulators you mentioned
> really are).

In newer versions of Xen (>4.5 IIRC, Paul knows more), QEMU doesn't
directly trap accesses to the 0xcf8/0xcfc IO ports, it's Xen instead
the one that detects and decodes such accesses, and then forwards them
to the IOREQ server that has been registered to handle them.

You cannot simply forward all MCFG accesses to QEMU as MMIO accesses,
Xen needs to decode them and they need to be handled as
IOREQ_TYPE_PCI_CONFIG requests, not IOREQ_TYPE_COPY IMO.

> 
> Basically, we have an emulated MMCONFIG area of 64/128/256MB size in
> the MMIO hole of the guest HVM domain. (BTW, this area itself can be
> considered a feature of the chipset the device model emulates.)
> It can be relocated to some other place in MMIO hole, this means that
> QEMU will trap accesses to the specific to the emulated chipset
> PCIEXBAR register and will issue same MMIO unmap/map calls as for
> any normal emulated MMIO range.
> 
> On the other hand, it won't be easy to provide emulated MMCONFIG
> translation into IOREQ_TYPE_PCI_CONFIG from Xen side. Xen should know
> current emulated MMCONFIG area position and size in order to translate
> (or not) accesses to it into corresponding BDF/reg pair (+whether that
> area is enabled for decoding or not). This will likely require to
> introduce new hypercall(s).

Yes, you will have to introduce new hypercalls to tell Xen the
position/size of the MCFG hole. Likely you want to tell it the start
address, the pci segment, start bus and end bus. I know pci segment
and start bus is always going to be 0 ATM, but it would be nice to
have a complete interface.

By your comment above I think you want an interface that allows you to
remove/add those MCFG areas at runtime.

> The question is if there will be any difference or benefit at all.

IMO it's not about benefits or differences, it's about correctness.
Xen currently detects accesses to the PCI configuration space from IO
ports and for consistency it should also detect accesses to this space
by any other means.

> It's basically the same emulated MMIO range after all, but in one case
> we trap accesses to it in Xen and translate them into
> IOREQ_TYPE_PCI_CONFIG requests.
> We have to provide some infrastructure to let Xen know where the device 
> model/guest expects to use the MMCONFIG area (and its size). The
> device model will need to use this infrastructure, informing Xen of
> any changes. Also, due to MMCONFIG nature there might be some pitfalls
> like a necessity to send multiple IOREQ_TYPE_PCI_CONFIG ioreqs caused by
> a single memory read/write operation.

This seems all fine. Why do you expect MCFG access to create multiple
IOREQ_TYPE_PCI_CONFIG but not multiple IOREQ_TYPE_COPY?

> In another case, we still have an emulated MMIO range, but Xen will send
> plain IOREQ_TYPE_COPY requests to QEMU which it handles itself.
> In such case, all code to work with MMCONFIG accesses is available for
> reuse right away (mmcfg -> pci_* translation in QEMU), no new
> functionality required neither in Xen or QEMU.

As I tried to argument above, I think this is not correct, but I would
also like that Paul expresses his opinion as the IOREQ maintainer.

> >>  tools/firmware/hvmloader/config.h   |   4 ++
> >>  tools/firmware/hvmloader/pci.c      | 127
> >> ++++++++++++++++++++++++++++--------
> >> tools/firmware/hvmloader/pci_regs.h |   2 + 3 files changed, 106
> >> insertions(+), 27 deletions(-)
> >> 
> >> diff --git a/tools/firmware/hvmloader/config.h
> >> b/tools/firmware/hvmloader/config.h index 6fde6b7b60..5443ecd804
> >> 100644 --- a/tools/firmware/hvmloader/config.h
> >> +++ b/tools/firmware/hvmloader/config.h
> >> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
> >>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
> >>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI
> >> connected */ #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
> >> +#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */
> >>  
> >>  /* MMIO hole: Hardcoded defaults, which can be dynamically
> >> expanded. */ #define PCI_MEM_END         0xfc000000
> >>  
> >> +/* possible values are: 64, 128, 256 */
> >> +#define PCI_MAX_MCFG_BUSES  64  
> >
> >What the reasoning for this value? Do we know which devices need ECAM
> >areas?
> 
> Yes, Xen is limited to bus 0 emulation currently, the description
> states "When multiple PCI buses support for Xen will be implemented,
> PCI_MAX_MCFG_BUSES may be changed to calculation of the number of buses
> according to results of the PCI devices enumeration".
> 
> I think it might be better to replace 'switch (PCI_MAX_MCFG_BUSES)'
> with the real code right away, i.e. change it to
> 
> 'switch (max_bus_num, aligned up to 64/128/256 boundary)',
> where max_bus_num should be set in PCI device enumeration code in
> pci_setup(). As we are limited to bus 0 currently, we'll just set it
> to 0 for now, before/after the PCI device enumeration loop (which should
> became multi-bus capable eventually).

I guess this is all pretty much hardcoded to bus 0 in several places,
so I'm not sure it's worth to add PCI_MAX_MCFG_BUSES. IMO if something
like this should be added it should be PCI_MAX_BUSES, and several
places should be changed to make use of it. Or ideally we should find
a way to detect this at runtime, without needed any hardcoded defines.

I think it would be good if you can add a note comment describing the
different MCFG sizes supported by the Q35 chipset (64/128/256).

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.