[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/6] PCI: determine whether a device has extended config space


  • To: Jan Beulich <jbeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
  • Date: Mon, 12 Jan 2026 16:07:33 -0500
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=suse.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=v1jsb76Nq0Wf6mBgjoKGQ1mv+0FByU8MvTVOwIFyp+0=; b=LCx2pRDBXNJzgMC5TvFZitbatX/m/aemzQd6xg4KXhkWN7pkJP1ltOYLKdytWxFfqz3DoOXCxkuazMEfkIrNTLsGuuf+lZeU1hSOdpBljT/kRM6EOSRASgn+nOoO5rk2KnGTzbg8+4Y0Qt3yJZgUfxno168u4ZlSuiLij/GJxl3g/Nt+ENr4GPcvstGAcG0jNngQ3C+XR+MaXBrddn7o8AnQ2OJ5A60nHbU/pcPDGakUiMSB0fEMlduNaA2gpUvCf06eJr41tBIRiormYrHZYLqQNUsE6jFzH1TAuC6DNnkM34eZMd5dSNyBh1FmwoYX3i7Zkwt70Bf8HJVBp5/7aQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=cHY244aeXZsb/k+l+EWeOBpGfEG0M3IYg3kpp2zN6sG400KhfWHzD8yDuQ03SNzy4xHN9xc7caU6f++ZJZkeLoaC+q3F+nsfZhGd8QJnSfTa1Lxr+s2FoMuLMwJc7xcW2c4OBvgh2/j5Iou0RsKl2vJOiiTJBR9KXxZZO7cgyGpDmLKEMxE+zuF6qESP0JhNw7/TGutR7Kp6TKWdkfx771wHtfFDpX8k06xvP6xsLpozICj9IP8D/hQ0Suco5XwmwLGSNRA54NQRDOWrcdEILvdP/hmucRnqu81rGFQLIGFEHALlq1cST/9zUAseL5bR/F8fF6pzh0o1OTg7s+jGGg==
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Mon, 12 Jan 2026 21:08:07 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 1/6/26 08:47, Jan Beulich wrote:
> Legacy PCI devices don't have any extended config space. Reading any part
> thereof may return all ones or other arbitrary data, e.g. in some cases
> base config space contents repeatedly.
> 
> Logic follows Linux 6.19-rc's pci_cfg_space_size(), albeit leveraging our
> determination of device type; in particular some comments are taken
> verbatim from there.
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> Note that alloc_pdev()'s switch doesn't handle DEV_TYPE_PCI2PCIe_BRIDGE at
> all. Such bridges will therefore not have ->ext_cfg set (which is likely
> wrong).

I initially read "set" as in "set to true", but I think you mean that ext_cfg
isn't assigned at all. Though perhaps it should actually be set to true,
because ...

> Shouldn't we handle them like DEV_TYPE_LEGACY_PCI_BRIDGE (or
> DEV_TYPE_PCI?) anyway (just like VT-d's set_msi_source_id() does)?

... in pdev_type(), we will only reach DEV_TYPE_PCI2PCIe_BRIDGE if it has
PCI_CAP_ID_EXP, which would indicate the device has extended config. So maybe it
would be better to handle it similar to DEV_TYPE_PCIe2PCI_BRIDGE in
alloc_pdev().

> Linux also has CONFIG_PCI_QUIRKS, allowing to compile out the slightly
> risky code (as reads may in principle have side effects). Should we gain
> such, too?

I don't have a strong opinion here.

> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -310,6 +310,41 @@ static void apply_quirks(struct pci_dev
>               * from trying to size the BARs or add handlers to trap accesses.
>               */
>              pdev->ignore_bars = true;
> +
> +    if ( pdev->ext_cfg )
> +    {
> +        unsigned int pos;
> +
> +        /*
> +         * PCI Express to PCI/PCI-X Bridge Specification, rev 1.0, 4.1.4 says
> +         * that when forwarding a type1 configuration request the bridge must
> +         * check that the extended register address field is zero.  The 
> bridge
> +         * is not permitted to forward the transactions and must handle it as
> +         * an Unsupported Request.  Some bridges do not follow this rule and
> +         * simply drop the extended register bits, resulting in the standard
> +         * config space being aliased, every 256 bytes across the entire
> +         * configuration space.  Test for this condition by comparing the 
> first
> +         * dword of each potential alias to the vendor/device ID.
> +         * Known offenders:
> +         *   ASM1083/1085 PCIe-to-PCI Reversible Bridge (1b21:1080, rev 01 & 
> 03)
> +         *   AMD/ATI SBx00 PCI to PCI Bridge (1002:4384, rev 40)
> +         */
> +        for ( pos = PCI_CFG_SPACE_SIZE;
> +              pos < PCI_CFG_SPACE_EXP_SIZE; pos += PCI_CFG_SPACE_SIZE )
> +        {
> +            if ( pci_conf_read16(pdev->sbdf, pos) != vendor ||
> +                 pci_conf_read16(pdev->sbdf, pos + 2) != device )
> +                break;
> +        }
> +
> +        if ( pos >= PCI_CFG_SPACE_EXP_SIZE )
> +        {
> +            printk(XENLOG_WARNING
> +                   "%pp: extended config space aliases base one\n",
> +                   &pdev->sbdf);
> +            pdev->ext_cfg = false;
> +        }
> +    }
>  }
>  
>  static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
> @@ -351,6 +386,8 @@ static struct pci_dev *alloc_pdev(struct
>          unsigned long flags;
>  
>          case DEV_TYPE_PCIe2PCI_BRIDGE:
> +            pdev->ext_cfg = true;
> +            fallthrough;
>          case DEV_TYPE_LEGACY_PCI_BRIDGE:
>              sec_bus = pci_conf_read8(pdev->sbdf, PCI_SECONDARY_BUS);
>              sub_bus = pci_conf_read8(pdev->sbdf, PCI_SUBORDINATE_BUS);
> @@ -363,9 +400,19 @@ static struct pci_dev *alloc_pdev(struct
>                  pseg->bus2bridge[sec_bus].devfn = devfn;
>              }
>              spin_unlock_irqrestore(&pseg->bus2bridge_lock, flags);
> +
> +            fallthrough;
> +        case DEV_TYPE_PCI:
> +            pos = pci_find_cap_offset(pdev->sbdf, PCI_CAP_ID_PCIX);
> +            if ( pos &&
> +                 (pci_conf_read32(pdev->sbdf, pos + PCI_X_STATUS) &
> +                  (PCI_X_STATUS_266MHZ | PCI_X_STATUS_533MHZ)) )
> +                pdev->ext_cfg = true;
>              break;
>  
>          case DEV_TYPE_PCIe_ENDPOINT:
> +            pdev->ext_cfg = true;
> +
>              pos = pci_find_cap_offset(pdev->sbdf, PCI_CAP_ID_EXP);
>              BUG_ON(!pos);
>              cap = pci_conf_read16(pdev->sbdf, pos + PCI_EXP_DEVCAP);
> @@ -409,9 +456,9 @@ static struct pci_dev *alloc_pdev(struct
>              }
>              break;
>  
> -        case DEV_TYPE_PCI:
>          case DEV_TYPE_PCIe_BRIDGE:
>          case DEV_TYPE_PCI_HOST_BRIDGE:
> +            pdev->ext_cfg = true;
>              break;
>  
>          default:
> @@ -420,6 +467,19 @@ static struct pci_dev *alloc_pdev(struct
>              break;
>      }
>  
> +    if ( pdev->ext_cfg &&
> +         /*
> +          * Regular PCI devices have 256 bytes, but PCI-X 2 and PCI Express
> +          * devices have 4096 bytes.  Even if the device is capable, that
> +          * doesn't mean we can access it.  Maybe we don't have a way to
> +          * generate extended config space accesses, or the device is behind 
> a
> +          * reverse Express bridge.  So we try reading the dword at
> +          * PCI_CFG_SPACE_SIZE which must either be 0 or a valid extended
> +          * capability header.
> +          */
> +         pci_conf_read32(pdev->sbdf, PCI_CFG_SPACE_SIZE) == 0xffffffffU )
> +        pdev->ext_cfg = false;

I'm using a machine where
xen/arch/x86/x86_64/mmconfig-shared.c:is_mmconf_reserved() will initially return
false during Xen boot:

(XEN) PCI: MCFG configuration 0: base f0000000 segment 0000 buses 00 - 3f
(XEN) PCI: Not using MCFG for segment 0000 bus 00-3f

Then, during dom0 boot, dom0 will call PHYSDEVOP_pci_mmcfg_reserved, after which
MCFG becomes enabled in Xen:

(XEN) PCI: Using MCFG for segment 0000 bus 00-3f

On such machines where mmcfg/ECAM is initially disabled, this will effectively
set ->ext_cfg to false for all devices discovered at Xen boot.

I'm not really sure if I have any good suggestions, but perhaps we could add a
macro or small function that returns something like
   ( pdev->ext_cfg && pci_conf_read32(pdev->sbdf, PCI_CFG_SPACE_SIZE) != 
0xffffffffU )
to allow this checking to happen dynamically (but this still wouldn't handle the
aliasing quirk). Maybe re-run the ext_cfg detection logic at the end of
PHYSDEVOP_pci_mmcfg_reserved?

Regardless, I'd be happy to provide my R-b without this addressed, but I am
curious if others think this as an issue?

> +
>      apply_quirks(pdev);
>      check_pdev(pdev);
>  
> @@ -717,6 +777,11 @@ int pci_add_device(u16 seg, u8 bus, u8 d
>  
>                  list_add(&pdev->vf_list, &pf_pdev->vf_list);
>              }
> +
> +            if ( !pdev->ext_cfg )
> +                printk(XENLOG_WARNING
> +                       "%pp: VF without extended config space?\n",
> +                       &pdev->sbdf);
>          }
>      }
>  
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -126,6 +126,9 @@ struct pci_dev {
>  
>      nodeid_t node; /* NUMA node */
>  
> +    /* Whether the device has extended config space. */

Nit: it would be nice to clearly state if this means the extended config is
accessible, or whether the device merely has it (but might not be accessible).

> +    bool ext_cfg;
> +
>      /* Device to be quarantined, don't automatically re-assign to dom0 */
>      bool quarantine;
>  
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.