[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [early RFC] ARM PCI Passthrough design document



Hi Roger,

On 25/01/17 11:42, Roger Pau Monné wrote:
On Tue, Jan 24, 2017 at 05:17:06PM +0000, Julien Grall wrote:
On 06/01/17 15:12, Roger Pau Monné wrote:
On Thu, Dec 29, 2016 at 02:04:15PM +0000, Julien Grall wrote:
    * Add a device
    * Remove a device
    * Assign a device to a guest
    * Deassign a device from a guest

XXX: Detail the interaction when assigning/deassigning device

Assigning a device will probably entangle setting up some direct MMIO mappings
(BARs and ROMs) plus a bunch of traps in order to perform emulation of accesses
to the PCI config space (or those can be setup when a new bridge is registered
with Xen).

I am planning to details the root complex emulation in a separate section. I
sent the design document before writing it.

In brief, I would expect the registration of a new bridge to setup the trap
to emulation access to the PCI configuration space. On ARM, the first
approach will rely on the OS to setup the BARs and ROMs. So they will be
mapped by the PCI configuration space emulation.

The reason on relying on the OS to setup the BARs/ROMs reducing the work to
do for a first version. Otherwise we would have to add code in the toolstack
to decide where to place the BARs/ROMs. I don't think it is a lot of work,
but it is not that important because it does not require a stable ABI (this
is an interaction between the hypervisor and the toolstack). Furthermore,
Linux (at least on ARM) is assigning the BARs at the setup. From my
understanding, this is the expected behavior with both DT (the DT has a
property to skip the scan) and ACPI.

This approach might work for Dom0, but for DomU you certainly need to know
where the MMIO regions of a device are, and either the toolstack or Xen needs
to setup this in advance (or at least mark which MMIO regions are available to
the DomU). Allowing a DomU to map random MMIO regions is certainly a security
issue.

I agree here. I provided more feedback on an answer to Stefano, I would your input there to if possible. See

<8ca91073-09e7-57ca-9063-b47e0aced39d@xxxxxxxxxx>

[...]



Based on what Linux is currently doing, there are two kind of quirks:
    * Accesses to the configuration space of certain sizes are not allowed
    * A specific driver is necessary for driving the host bridge

Hm, so what are the issues that make this bridges need specific drivers?

This might be quite problematic if you also have to emulate this broken
behavior inside of Xen (because Dom0 is using a specific driver).

I am not expecting to emulate the configuration space access for DOM0. I
know you mentioned that it would be necessary to hide PCI used by Xen (such
as the UART) to DOM0 or configuring MSI. But for ARM, the UART is integrated
in the SOC and MSI will be configured through the interrupt controller.

Right, we certainly need to do it for x86, but I don't know that much of the
ARM architecture in order to know if that's needed or not. I'm also wondering
if having both Xen and the Dom0 directly accessing the ECAM area is fine, even
if they use different cache mapping attributes?

I don't know much x86, but on ARM we could specify caching attributes in the stage-2 page tables (aka EPT on x86). The MMU will use the stricter memory attributes between stage-2 and the guest page tables.

In the case of ECAM, we could disable the caching in stage-2 page tables. So the ECAM will always access uncached.


So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
with all the relevant informations. This will be done via a new hypercall
PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:

struct physdev_pci_host_bridge_add
{
    /* IN */
    uint16_t seg;
    /* Range of bus supported by the host bridge */
    uint8_t  bus_start;
    uint8_t  bus_nr;
    uint32_t res0;  /* Padding */
    /* Information about the configuration space region */
    uint64_t cfg_base;
    uint64_t cfg_size;
}

Why do you need to cfg_size attribute? Isn't it always going to be 4096 bytes
in size?

The cfg_size is here to help us to match the corresponding node in the
device tree. The cfg_size may differ depending on how the hardware has
implemented the access to the configuration space.

But certainly cfg_base needs to be aligned to a PAGE_SIZE? And according to the
spec cfg_size cannot be bigger than 4KB (PAGE_SIZE), so in any case you will
end up mapping a whole 4KB page, because that's the minimum granularity of the
p2m?

cfg_size would be a multiple of 4KB as each configuration space would have a unique region. But as you mentioned later we could re-use MMCFG_reserved.


But to be fair, I think we can deal without this property. For ACPI, the
size will vary following the number of bus handled and can be deduced. For
DT, the base address and bus range should be enough to find the associated
node.


If that field is removed you could use the PHYSDEVOP_pci_mmcfg_reserved
hypercalls.

DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
bridge available on the platform. When Xen is receiving the hypercall, the
the driver associated to the host bridge will be instantiated.

XXX: Shall we limit DOM0 the access to the configuration space from that
moment?

Most definitely yes, you should instantiate an emulated bridge over the real
one, in order to proxy Dom0 accesses to the PCI configuration space. You for
example don't want Dom0 moving the position of the BARs of PCI devices without
Xen being aware (and properly changing the second stage translation).

The problem is on ARM we don't have a single way to access the configuration
space. So we would need different emulator in Xen, which I don't like unless
there is a strong reason to do it.

We could avoid DOM0s to modify the position of the BARs after setup. I also
remembered you mention about MSI configuration, for ARM this is done via the
interrupt controller.


## Discovering and register PCI

Similarly to x86, PCI devices will be discovered by DOM0 and register
using the hypercalls PHYSDEVOP_pci_add_device or PHYSDEVOP_manage_pci_add_ext.

Why do you need this? If you have access to the bridges you can scan them from
Xen and discover the devices AFAICT.

I am a bit confused. Are you saying that you plan to ditch them for PVH? If
so, why are they called by Linux today?

I think we can get away with PHYSDEVOP_pci_mmcfg_reserved only, but maybe I'm
missing something. AFAICT Xen should be able to gather all the other data by
itself from the PCI config space once it knows the details about the host
bridge.

From my understanding, some host bridges would need to be configured before been able to be used (TBC). Bringing this initialization in Xen may be complex. For instance the xgene hostbridge (see linux/drivers/pci/host/pci-xgene.c) will require to enable the clock.

I would let the initialization of the hostbridge in Linux, so we are doing the scanning in Xen we will need an hypercall to let them knows the host bridges has been initialized.

I gave a bit more background on my answer to Stefano. So I would recommend to continue the conversation there.




By default all the PCI devices will be assigned to DOM0. So Xen would have
to configure the SMMU and Interrupt Controller to allow DOM0 to use the PCI
devices. As mentioned earlier, those subsystems will require the StreamID
and DeviceID. Both can be deduced from the RID.

XXX: How to hide PCI devices from DOM0?

By adding the ACPI namespace of the device to the STAO and blocking Dom0
access to this device in the emulated bridge that Dom0 will have access to
(returning 0xFFFF when Dom0 tries to read the vendor ID from the PCI header).

Sorry I was not clear here. By hiding, I meant DOM0 not instantiating a
driver (similarly to xen-pciback.hide). We still want DOM0 to access the PCI
config space in order to reset the device. Unless you plan to import all the
reset quirks in Xen?

I don't have a clear opinion here, and I don't know all thew details of this
reset hacks.

Actually I looked at the Linux code (see __pci_dev_reset in drivers/pci/pci.c) and there are less quirks than I expected. The list of quirks can be found in pci_dev_reset_methods in drivers/pci/quirks.c.

There are few way to reset a device (see __pci_dev_reset), they look all based on accessing the configuration space. So I guess it should be fine to import that in Xen. Any opinions?

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.