[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN PATCH] tools/libs/light/libxl_pci.c: explicitly grant access to Intel IGD opregion



On 3/30/22 1:15 PM, Anthony PERARD wrote:
Hi Chuck,

On Sun, Mar 13, 2022 at 11:41:37PM -0400, Chuck Zmudzinski wrote:
When gfx_passthru is enabled for the Intel IGD, hvmloader maps the IGD
opregion to the guest but libxl does not grant the guest permission to
I'm not reading the same thing when looking at code in hvmloader. It
seems that hvmloader allocate some memory for the IGD opregion rather
than mapping it.


tools/firmware/hvmloader/pci.c:184
     if ( vendor_id == 0x8086 )
     {
         igd_opregion_pgbase = mem_hole_alloc(IGD_OPREGION_PAGES);
         /*
          * Write the the OpRegion offset to give the opregion
          * address to the device model. The device model will trap
          * and map the OpRegion at the give address.
          */
         pci_writel(vga_devfn, PCI_INTEL_OPREGION,
                    igd_opregion_pgbase << PAGE_SHIFT);
     }

I think this write would go through QEMU, does it do something with it?
(I kind of replying to this question at the end of the mail.)

Is this code in hvmloader actually run in your case?

Currently, because of another bug in Qemu upstream, this crash can
only be reproduced using the traditional Qemu device model, and of
qemu-traditional is a bit old... What is the bug in QEMU? Do you have a
link to a patch/mail?

I finally found a patch for the other bug in Qemu upstream. The
patch is currently being used in QubesOS, and they first added
it to their version of Qemu way back in 2017:

https://github.com/QubesOS/qubes-vmm-xen-stubdom-linux/pull/3/commits/ab2b4c2ad02827a73c52ba561e9a921cc4bb227c

Although this patch is advertised as applying to the device model
running in a Linux stub domain, it is also needed (at least on my
system) with the device model running in Dom0.

Here is the story:

The patch is titled "qemu: fix wrong mask in pci capability registers handling"

There is scant information in the commit message about the nature of
the problem, but I discovered the following in my testing:

On my Intel Haswell system configured for PCI passthrough to the
Xen HVM guest, Qemu does indeed incorrectly set the emulated
register because it uses the wrong mask that disables instead of
enables the PCI_STATUS_CAP_LIST bit of the PCI_STATUS register.

This disabled the MSI-x capability of two of the three PCI devices
passed through to the Xen HVM guest. The problem only
manifests in a bad way in a Linux guest, not in a Windows guest.

One possible reason that only Linux guests are affected is that
I discovered in the Xen xl-dmesg verbose logs that Windows and
Linux use different callbacks for interrupts:

(XEN) Dom1 callback via changed to GSI 28
...
(XEN) Dom3 callback via changed to Direct Vector 0xf3

Dom1 is a Windows Xen HVM and Dom3 is a Linux HVM

Apparently the Direct Vector callback that Linux uses requires
MSI or MSI-x capability of the passed through devices, but the
wrong mask in Qemu disables that capability.

After applying the QubesOS patch to Qemu upstream, the
PCI_STATUS_CAP_LIST bit is set correctly for the guest and
PCI and Intel IGD passthrough works normally because the
Linux guest can make use of the MSI-x capability of the
PCI devices.

The problem was discovered almost five years ago. I don't
know why the fix has not been committed to Qemu
upstream yet.

After this, I was able to discover that the need for libxl to
explicitly grant permission for access to the Intel OpRegion
is only needed for the old traditional device model because
the Xen hypercall to gain permission is included in upstream
Qemu, but it is omitted from the old traditional device model.

So this patch is not needed for users of the upstream device
model who also add the QubesOS patch to fix the
PCI_STATUS_CAP_LIST bit in the PCI_STATUS register.

This all assumes the device model is running in Dom0. The
permission for access to the Intel OpRegion might still be
needed if the upstream device model is running in a stub
domain.

There are other problems in addition to this problem of access
to the Intel OpRegion that are discussed here:

https://www.qubes-os.org/news/2017/10/18/msi-support/

As old as that post is, the feature of allowing PCI and VGA
passthrough to HVM domains is still not well supported,
especially for the case when the device model runs in a
stub domain.

Since my proposed patch only applies to the very insecure
case of the old traditional device model running in Dom0,
I will not pursue it further.

I will look for this feature in future versions of Xen. Currently,
Xen 4.16 advertises support for Linux-based stub domains
as "Tech Preview." So future versions of Xen might handle
this problem in libxl or perhaps in some other way, and also
hopefully the patch to Qemu to fix the PCI capabilities mask
can be committed to Qemu upstream soon so this feature
of Intel IGD passthrough can at least work with Linux
guests and the upstream Qemu running in Dom0.

Regards,

Chuck


diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 4bbbfe9f16..a4fc473de9 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -2531,6 +2572,37 @@ int libxl__grant_vga_iomem_permission(libxl__gc *gc, 
const uint32_t domid,
                    domid, vga_iomem_start, (vga_iomem_start + 0x20 - 1));
              return ret;
          }
+
+        /* If this is an Intel IGD, allow access to the IGD opregion */
+        if (!libxl__is_igd_vga_passthru(gc, d_config)) return 0;
+
+        uint32_t igd_opregion = sysfs_dev_get_igd_opregion(gc, pci);
+        uint32_t error = 0xffffffff;
+        if (igd_opregion == error) break;
+
+        vga_iomem_start = ( (uint64_t) igd_opregion ) >> XC_PAGE_SHIFT;
+        ret = xc_domain_iomem_permission(CTX->xch, stubdom_domid,
+                                         vga_iomem_start,
+                                         IGD_OPREGION_PAGES, 1);
+        if (ret < 0) {
+            LOGED(ERROR, domid,
+                  "failed to give stubdom%d access to iomem range "
+                  "%"PRIx64"-%"PRIx64" for IGD passthru",
+                  stubdom_domid, vga_iomem_start, (vga_iomem_start +
+                                                IGD_OPREGION_PAGES - 1));
+            return ret;
+        }
+        ret = xc_domain_iomem_permission(CTX->xch, domid,
+                                         vga_iomem_start,
+                                         IGD_OPREGION_PAGES, 1);
Here, you seems to add permission to an address that is read from the
pci config space of the device, but as I've pointed above hvmloader
seems to overwrite this address. It this call to
xc_domain_iomem_permission() does actually anything useful?
Or is it by luck that the address returned by
sysfs_dev_get_igd_opregion() happened to be the address that hvmloader
is going to write?

Or maybe hvmloader doesn't actually do anything?


Some more though on that, looking at QEMU, it seems that there's already
a call to xc_domain_iomem_permission(), in igd_write_opregion(). So
adding one in libxl would seems redundant, or maybe it the one for the
device model's domain that's needed  (named 'stubdom_domid' here)?

Thanks,





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.