[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [DRAFT RFC] PVHv2 interaction with physical devices



On Thu, Nov 10, 2016 at 08:53:05AM -0500, Konrad Rzeszutek Wilk wrote:
> On Thu, Nov 10, 2016 at 11:39:08AM +0100, Roger Pau Monné wrote:
> > On Wed, Nov 09, 2016 at 01:45:17PM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Nov 09, 2016 at 04:59:12PM +0100, Roger Pau Monné wrote:
> > > > In order to improve the mapping of device memory areas, Xen will have to
> > > > know of those devices in advance (before Dom0 tries to interact with 
> > > > them)
> > > > so that the memory BARs will be properly mapped into Dom0 memory map.
> > > 
> > > Oh, that is going to be a problem with SR-IOV. Those are created _after_
> > > dom0 has booted. In fact they are done by the drivers themselves.
> > > 
> > > See xen_add_device in drivers/xen/pci.c how this is handled.
> > 
> > Is the process of creating those VF something standart? (In the sense that 
> > it can be detected by Xen, and proper mappings stablished)
> 
> Yes and no.
> 
> You can read from the PCI configuration that the device (Physical
> function) has SR-IOV. But that information may be in the extended
> configuration registers so you need MCFG. Anyhow the only thing the PF
> will tell you is the BAR regions they will occupy (since they
> are behind the bridge) but not the BDFs:

But just knowing the BARs position is enough for Xen to install the identity 
mappings AFAICT?

Or are the more BARs that will only appear after the SR-IOV functionality 
has been enabled?

From the documentation that I've found, if you detect that the device has 
PCI_EXT_CAP_ID_SRIOV, you can then read the BARs and map them into Dom0, but 
maybe I'm missing something (and I have not been able to test this, although 
my previous PVHv2 Dom0 series already contained code in order to perform 
this):

http://xenbits.xen.org/gitweb/?p=people/royger/xen.git;a=commit;h=260cfd1e96e56ab4b58a414d544d92a77e210050

>         Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
>                 IOVCap: Migration-, Interrupt Message Number: 000
>                 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
>                 IOVSta: Migration-
>                 Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function 
> Dependency Link: 00
>                 VF offset: 128, stride: 2, Device ID: 10ca
>                 Supported Page Size: 00000553, System Page Size: 00000001
>                 Region 0: Memory at 00000000fbda0000 (64-bit, 
> non-prefetchable)
>                 Region 3: Memory at 00000000fbd80000 (64-bit, 
> non-prefetchable)
>                 VF Migration: offset: 00000000, BIR: 0
>         Kernel driver in use: igb
> 
> And if I enable SR-IOV on the PF I get:
> 
> 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network 
> Connection (rev 01)
> 0a:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 0a:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 0a:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 0a:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 0a:11.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 0a:11.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 0a:11.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 
> -bash-4.1# lspci -s 0a:10.0 -v
> 0a:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function
> (rev 01)
>         Subsystem: Super Micro Computer Inc Device 10c9
>         Flags: bus master, fast devsel, latency 0
>         [virtual] Memory at fbda0000 (64-bit, non-prefetchable) [size=16K]
>         [virtual] Memory at fbd80000 (64-bit, non-prefetchable) [size=16K]
>         Capabilities: [70] MSI-X: Enable+ Count=3 Masked-
>         Capabilities: [a0] Express Endpoint, MSI 00
>         Capabilities: [100] Advanced Error Reporting
>         Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
>         Kernel driver in use: igbvf
> 
> -bash-4.1# lspci -s 0a:11.4 -v
> 0a:11.4 Ethernet controller: Intel Corporation 82576 Virtual Function
> (rev 01)
>         Subsystem: Super Micro Computer Inc Device 10c9
>         Flags: bus master, fast devsel, latency 0
>         [virtual] Memory at fbdb8000 (64-bit, non-prefetchable) [size=16K]
>         [virtual] Memory at fbd98000 (64-bit, non-prefetchable) [size=16K]
>         Capabilities: [70] MSI-X: Enable+ Count=3 Masked-
>         Capabilities: [a0] Express Endpoint, MSI 00
>         Capabilities: [100] Advanced Error Reporting
>         Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
>         Kernel driver in use: igbvf

So it seems that the memory for individual VFs is taken from the BARs listed 
inside of PCI_EXT_CAP_ID_SRIOV.

> > > > PCI memory BARs
> > > > ---------------
> > > > 
> > > > PCI devices discovered by Xen will have it's BARs scanned in order to 
> > > > detect
> > > > memory BARs, and those will be identity mapped to Dom0. Since BARs can 
> > > > be
> > > > freely moved by the Dom0 OS by writing to the appropriate PCI config 
> > > > space
> > > > register, Xen must trap those accesses and unmap the previous region and
> > > > map the new one as set by Dom0.
> > > 
> > > You can make that simpler - we have hypercalls to "notify" in Linux
> > > when a device is changing. Those can provide that information as well.
> > > (This is what PV dom0 does).
> > > 
> > > Also you are missing one important part - the MMCFG. That is required
> > > for Xen to be able to poke at the PCI configuration spaces (above the 
> > > 256).
> > > And you can only get the MMCFG if the ACPI DSDT has been parsed.
> > 
> > Hm, I guess I'm missing something, but at least on my hardware Xen seems to 
> > be able to parse the MCFG ACPI table before Dom0 does anything with the 
> > DSDT:
> > 
> > (XEN) PCI: MCFG configuration 0: base f8000000 segment 0000 buses 00 - 3f
> > (XEN) PCI: MCFG area at f8000000 reserved in E820
> > (XEN) PCI: Using MCFG for segment 0000 bus 00-3f
> > 
> > > So if you do the PCI bus scanning _before_ booting PVH dom0, you may
> > > need to update your view of PCI devices after the MMCFG locations
> > > have been provided to you.
> > 
> > I'm not opposed to keep the PHYSDEVOP_pci_mmcfg_reserved, but I still have 
> > to see hardware where this is actually needed. Also, AFAICT, FreeBSD at 
> > least is only able to detect MMCFG regions present in the MCFG ACPI table:
> 
> There is some hardware out there (I think I saw this with an IBM HS-20,
> but I can't recall the details). The specification says that the MCFG
> _may_ be defined in the MADT, but is not guaranteed. Which means that it
> can bubble via the ACPI DSDT code.

Hm, MCFG is a top-level table on it's own, and AFAIK not tied to the MADT in 
any way. I'm not opposed to introduce PHYSDEVOP_pci_mmcfg_reserved if it's 
really needed, but I won't do this blindly. We first need to know if there 
are systems out there that don't report MMCFG areas in the MCFG ACPI table 
properly, and then whether those systems would actually be capable of 
running a PVH Dom0 (if they are old like IBM HS-20 they won't be capable of 
running a PVH Dom0 due to missing virtualization features anyway).

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.