Xen project Mailing List

Re: [XEN PATCH] tools/libs/light/libxl_pci.c: explicitly grant access to Intel IGD opregion

To: Anthony PERARD <anthony.perard@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>

From: Chuck Zmudzinski <brchuckz@xxxxxxxxxxxx>

Date: Fri, 3 Jun 2022 21:10:25 -0400

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Wei Liu <wl@xxxxxxx>, Juergen Gross <jgross@xxxxxxxx>

Delivery-date: Sat, 04 Jun 2022 01:10:43 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 3/30/22 1:15 PM, Anthony PERARD wrote:

Hi Chuck,

On Sun, Mar 13, 2022 at 11:41:37PM -0400, Chuck Zmudzinski wrote:

When gfx_passthru is enabled for the Intel IGD, hvmloader maps the IGD
opregion to the guest but libxl does not grant the guest permission to

I'm not reading the same thing when looking at code in hvmloader. It
seems that hvmloader allocate some memory for the IGD opregion rather
than mapping it.


tools/firmware/hvmloader/pci.c:184
     if ( vendor_id == 0x8086 )
     {
         igd_opregion_pgbase = mem_hole_alloc(IGD_OPREGION_PAGES);
         /*
          * Write the the OpRegion offset to give the opregion
          * address to the device model. The device model will trap
          * and map the OpRegion at the give address.
          */
         pci_writel(vga_devfn, PCI_INTEL_OPREGION,
                    igd_opregion_pgbase << PAGE_SHIFT);
     }

I think this write would go through QEMU, does it do something with it?
(I kind of replying to this question at the end of the mail.)

Is this code in hvmloader actually run in your case?

Currently, because of another bug in Qemu upstream, this crash can
only be reproduced using the traditional Qemu device model, and of

qemu-traditional is a bit old... What is the bug in QEMU? Do you have a
link to a patch/mail?

I finally found a patch for the other bug in Qemu upstream. The patch is currently being used in QubesOS, and they first added it to their version of Qemu way back in 2017: https://github.com/QubesOS/qubes-vmm-xen-stubdom-linux/pull/3/commits/ab2b4c2ad02827a73c52ba561e9a921cc4bb227c Although this patch is advertised as applying to the device model running in a Linux stub domain, it is also needed (at least on my system) with the device model running in Dom0. Here is the story:

The patch is titled "qemu: fix wrong mask in pci capability registers

handling"

There is scant information in the commit message about the nature of the problem, but I discovered the following in my testing: On my Intel Haswell system configured for PCI passthrough to the Xen HVM guest, Qemu does indeed incorrectly set the emulated register because it uses the wrong mask that disables instead of enables the PCI_STATUS_CAP_LIST bit of the PCI_STATUS register. This disabled the MSI-x capability of two of the three PCI devices passed through to the Xen HVM guest. The problem only manifests in a bad way in a Linux guest, not in a Windows guest. One possible reason that only Linux guests are affected is that I discovered in the Xen xl-dmesg verbose logs that Windows and Linux use different callbacks for interrupts: (XEN) Dom1 callback via changed to GSI 28 ... (XEN) Dom3 callback via changed to Direct Vector 0xf3 Dom1 is a Windows Xen HVM and Dom3 is a Linux HVM Apparently the Direct Vector callback that Linux uses requires MSI or MSI-x capability of the passed through devices, but the wrong mask in Qemu disables that capability. After applying the QubesOS patch to Qemu upstream, the PCI_STATUS_CAP_LIST bit is set correctly for the guest and PCI and Intel IGD passthrough works normally because the Linux guest can make use of the MSI-x capability of the PCI devices. The problem was discovered almost five years ago. I don't know why the fix has not been committed to Qemu upstream yet. After this, I was able to discover that the need for libxl to explicitly grant permission for access to the Intel OpRegion is only needed for the old traditional device model because the Xen hypercall to gain permission is included in upstream Qemu, but it is omitted from the old traditional device model. So this patch is not needed for users of the upstream device model who also add the QubesOS patch to fix the PCI_STATUS_CAP_LIST bit in the PCI_STATUS register. This all assumes the device model is running in Dom0. The permission for access to the Intel OpRegion might still be needed if the upstream device model is running in a stub domain. There are other problems in addition to this problem of access to the Intel OpRegion that are discussed here: https://www.qubes-os.org/news/2017/10/18/msi-support/ As old as that post is, the feature of allowing PCI and VGA passthrough to HVM domains is still not well supported, especially for the case when the device model runs in a stub domain. Since my proposed patch only applies to the very insecure case of the old traditional device model running in Dom0, I will not pursue it further. I will look for this feature in future versions of Xen. Currently, Xen 4.16 advertises support for Linux-based stub domains as "Tech Preview." So future versions of Xen might handle this problem in libxl or perhaps in some other way, and also hopefully the patch to Qemu to fix the PCI capabilities mask can be committed to Qemu upstream soon so this feature of Intel IGD passthrough can at least work with Linux guests and the upstream Qemu running in Dom0. Regards, Chuck

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 4bbbfe9f16..a4fc473de9 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -2531,6 +2572,37 @@ int libxl__grant_vga_iomem_permission(libxl__gc *gc, 
const uint32_t domid,
                    domid, vga_iomem_start, (vga_iomem_start + 0x20 - 1));
              return ret;
          }
+
+        /* If this is an Intel IGD, allow access to the IGD opregion */
+        if (!libxl__is_igd_vga_passthru(gc, d_config)) return 0;
+
+        uint32_t igd_opregion = sysfs_dev_get_igd_opregion(gc, pci);
+        uint32_t error = 0xffffffff;
+        if (igd_opregion == error) break;
+
+        vga_iomem_start = ( (uint64_t) igd_opregion ) >> XC_PAGE_SHIFT;
+        ret = xc_domain_iomem_permission(CTX->xch, stubdom_domid,
+                                         vga_iomem_start,
+                                         IGD_OPREGION_PAGES, 1);
+        if (ret < 0) {
+            LOGED(ERROR, domid,
+                  "failed to give stubdom%d access to iomem range "
+                  "%"PRIx64"-%"PRIx64" for IGD passthru",
+                  stubdom_domid, vga_iomem_start, (vga_iomem_start +
+                                                IGD_OPREGION_PAGES - 1));
+            return ret;
+        }
+        ret = xc_domain_iomem_permission(CTX->xch, domid,
+                                         vga_iomem_start,
+                                         IGD_OPREGION_PAGES, 1);

Here, you seems to add permission to an address that is read from the
pci config space of the device, but as I've pointed above hvmloader
seems to overwrite this address. It this call to
xc_domain_iomem_permission() does actually anything useful?
Or is it by luck that the address returned by
sysfs_dev_get_igd_opregion() happened to be the address that hvmloader
is going to write?

Or maybe hvmloader doesn't actually do anything?


Some more though on that, looking at QEMU, it seems that there's already
a call to xc_domain_iomem_permission(), in igd_write_opregion(). So
adding one in libxl would seems redundant, or maybe it the one for the
device model's domain that's needed  (named 'stubdom_domid' here)?

Thanks,

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.