[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 06/10] vpci: Make every domain handle its own BARs





On 13/11/2020 11:26, Jan Beulich wrote:
On 13.11.2020 12:06, Julien Grall wrote:
Hi Jan,

On 13/11/2020 10:53, Jan Beulich wrote:
On 13.11.2020 11:36, Julien Grall wrote:
On 13/11/2020 10:25, Jan Beulich wrote:
On 13.11.2020 07:32, Oleksandr Andrushchenko wrote:
On 11/12/20 4:46 PM, Roger Pau Monné wrote:
On Thu, Nov 12, 2020 at 01:16:10PM +0000, Oleksandr Andrushchenko wrote:
On 11/12/20 11:40 AM, Roger Pau Monné wrote:
On Mon, Nov 09, 2020 at 02:50:27PM +0200, Oleksandr Andrushchenko wrote:
From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
+static uint32_t bar_read_dispatch(const struct pci_dev *pdev, unsigned int reg,
+                                  void *data)
+{
+    struct vpci_bar *vbar, *bar = data;
+
+    if ( is_hardware_domain(current->domain) )
+        return bar_read_hwdom(pdev, reg, data);
+
+    vbar = get_vpci_bar(current->domain, pdev, bar->index);
+    if ( !vbar )
+        return ~0;
+
+    return bar_read_guest(pdev, reg, vbar);
+}
+
+static void bar_write_dispatch(const struct pci_dev *pdev, unsigned int reg,
+                               uint32_t val, void *data)
+{
+    struct vpci_bar *bar = data;
+
+    if ( is_hardware_domain(current->domain) )
+        bar_write_hwdom(pdev, reg, val, data);
+    else
+    {
+        struct vpci_bar *vbar = get_vpci_bar(current->domain, pdev, 
bar->index);
+
+        if ( !vbar )
+            return;
+        bar_write_guest(pdev, reg, val, vbar);
+    }
+}
You should assign different handlers based on whether the domain that
has the device assigned is a domU or the hardware domain, rather than
doing the selection here.
Hm, handlers are assigned once in init_bars and this function is only called

for hwdom, so there is no way I can do that for the guests. Hence, the 
dispatcher
I think we might want to reset the vPCI handlers when a devices gets
assigned and deassigned.

In ARM case init_bars is called too early: PCI device assignment is currently

initiated by Domain-0' kernel and is done *before* PCI devices are given memory

ranges and BARs assigned:

[    0.429514] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.429532] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff]
[    0.429555] pci_bus 0000:00: root bus resource [mem 0xfe200000-0xfe3fffff]
[    0.429575] pci_bus 0000:00: root bus resource [mem 0x30000000-0x37ffffff]
[    0.429604] pci_bus 0000:00: root bus resource [mem 0x38000000-0x3fffffff 
pref]
[    0.429670] pci 0000:00:00.0: enabling Extended Tags
[    0.453764] pci 0000:00:00.0: -------------------- BUS_NOTIFY_ADD_DEVICE

< init_bars >

[    0.453793] pci 0000:00:00.0: -- IRQ 0
[    0.458825] pci 0000:00:00.0: Failed to add - passthrough or MSI/MSI-X might 
fail!
[    0.471790] pci 0000:01:00.0: -------------------- BUS_NOTIFY_ADD_DEVICE

< init_bars >

[    0.471821] pci 0000:01:00.0: -- IRQ 255
[    0.476809] pci 0000:01:00.0: Failed to add - passthrough or MSI/MSI-X might 
fail!

< BAR assignments below >

[    0.488233] pci 0000:00:00.0: BAR 14: assigned [mem 0xfe200000-0xfe2fffff]
[    0.488265] pci 0000:00:00.0: BAR 15: assigned [mem 0x38000000-0x380fffff 
pref]

In case of x86 this is pretty much ok as BARs are already in place, but for ARM 
we

need to take care and re-setup vPCI BARs for hwdom.

Even on x86 there's no guarantee that all devices have their BARs set
up by firmware.

In a subsequent reply you've suggested to move init_bars from "add" to
"assign", but I'm having trouble seeing what this would change: It's
not Dom0 controlling assignment (to itself), but Xen assigns the device
towards the end of pci_add_device().

Things are getting even more

complicated if the host PCI bridge is not ECAM like, so you cannot set 
mmio_handlers

and trap hwdom's access to the config space to update BARs etc. This is why I 
have that

ugly hack for rcar_gen3 to read actual BARs for hwdom.

How to config space accesses work there? The latest for MSI/MSI-X it'll
be imperative that Xen be able to intercept config space writes.

I am not sure to understand your last sentence. Are you saying that we
always need to trap access to MSI/MSI-X message in order to sanitize it?

If one is using the GICv3 ITS (I haven't investigated other MSI
controller), then I don't believe you need to sanitize the MSI/MSI-X
message in most of the situation.

Well, if it's fine for the guest to write arbitrary values to message
address and message data,

The message address would be the doorbell of the ITS that is usually
going through the IOMMU page-tables. Although, I am aware of a couple of
platforms where the doorbell access (among other address ranges
including P2P transaction) bypass the IOMMU. In this situation, we would
need a lot more work than just trapping the access.

When you say "The message address would be the doorbell of the ITS" am
I right in understanding that's the designated address to be put there?
What if the guest puts some random different address there?

My point was that all the accesses from a PCI device should go through the IOMMU. Although, I know this may not be true for all the platforms.

In which case, sanitizing the MSI message address is not going to help because a PCI device can DMA into memory range that bypass the IOMMU.


Regarding the message data, for the ITS this is an event ID. The HW will
then tag each message with the device ID (this prevents spoofing). The
tupple (device ID, event ID) is used by the ITS to decide where to
inject the event.

Whether other MSI controllers (e.g. GICv2m) have similar isolation
feature will be on the case by case basis.

_and_ to arbitrarily enable/disable MSI / MSI-X,
then yes, no interception would be needed.
The device would be owned by the guest, so I am not sure to understand
the exact problem of letting it enabling/disabling MSI/MSI-X. Do you
mind expanding your thoughts?

Question is - is Xen involved in any way in the handling of interrupts
from such a device? If not, then I guess full control can indeed be
left with the guest.

Xen will only forward the interrupts to the guest. This is not very different to how other interrupts (e.g. SPIs) are dealt with.

So I don't see the problem of giving full control to the guest.


Furthermore, you can also control which event is enabled/disabled at the
ITS level.

And that's something Xen controls?

Yes. We only expose a virtual ITS to the guest.

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.