[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v5 11/11] vpci/msix: add MSI-X handlers



Add handlers for accesses to the MSI-X message control field on the
PCI configuration space, and traps for accesses to the memory region
that contains the MSI-X table and PBA. This traps detect attempts from
the guest to configure MSI-X interrupts and properly sets them up.

Note that accesses to the Table Offset, Table BIR, PBA Offset and PBA
BIR are not trapped by Xen at the moment.

Finally, turn the panic in the Dom0 PVH builder into a warning.

Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
---
Cc: Jan Beulich <jbeulich@xxxxxxxx>
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
---
Changes since v4:
 - Remove parentheses around offsetof.
 - Add "being" to MSI-X enabling comment.
 - Use INVALID_PIRQ.
 - Add a simple sanity check to vpci_msix_arch_enable in order to
   detect wrong MSI-X entries more quickly.
 - Constify vpci_msix_arch_print entry argument.
 - s/cpu/fixed/ in vpci_msix_arch_print.
 - Dump the MSI-X info together with the MSI info.
 - Fix vpci_msix_control_write to take into account changes to the
   address and data fields when switching the function mask bit.
 - Only disable/enable the entries if the address or data fields have
   been updated.
 - Usew the BAR enable field to check if a BAR is mapped or not
   (instead of reading the command register for each device).
 - Fix error path in vpci_msix_read to set the return data to ~0.
 - Simplify mask usage in vpci_msix_write.
 - Cast data to uint64_t when shifting it 32 bits.
 - Fix writes to the table entry control register to take into account
   if the mask-all bit is set.
 - Add some comments to clarify the intended behavior of the code.
 - Align the PBA size to 64-bits.
 - Remove the error label in vpci_init_msix.
 - Try to compact the layout of the vpci_msix structure.
 - Remove the local table_bar and pba_bar variables from
   vpci_init_msix, they are used only once.

Changes since v3:
 - Propagate changes from previous versions: remove xen_ prefix, use
   the new fields in vpci_val and remove the return value from
   handlers.
 - Remove the usage of GENMASK.
 - Mave the arch-specific parts of the dump routine to the
   x86/hvm/vmsi.c dump handler.
 - Chain the MSI-X dump handler to the 'M' debug key.
 - Fix the header BAR mappings so that the MSI-X regions inside of
   BARs are unmapped from the domain p2m in order for the handlers to
   work properly.
 - Unconditionally trap and forward accesses to the PBA MSI-X area.
 - Simplify the conditionals in vpci_msix_control_write.
 - Fix vpci_msix_accept to use a bool type.
 - Allow all supported accesses as described in the spec to the MSI-X
   table.
 - Truncate the returned address when the access is a 32b read.
 - Always return X86EMUL_OKAY from the handlers, returning ~0 in the
   read case if the access is not supported, or ignoring writes.
 - Do not check that max_entries is != 0 in the init handler.
 - Use trylock in the dump handler.

Changes since v2:
 - Split out arch-specific code.

This patch has been tested with devices using both a single MSI-X
entry and multiple ones.
---
 xen/arch/x86/hvm/dom0_build.c    |   2 +-
 xen/arch/x86/hvm/hvm.c           |   1 +
 xen/arch/x86/hvm/vmsi.c          | 138 ++++++++++-
 xen/drivers/vpci/Makefile        |   2 +-
 xen/drivers/vpci/header.c        |  81 +++++++
 xen/drivers/vpci/msi.c           |  25 +-
 xen/drivers/vpci/msix.c          | 478 +++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/domain.h |   3 +
 xen/include/asm-x86/hvm/io.h     |  19 ++
 xen/include/xen/vpci.h           |  39 ++++
 10 files changed, 779 insertions(+), 9 deletions(-)
 create mode 100644 xen/drivers/vpci/msix.c

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index c65eb8503f..3deae7cb4a 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -1086,7 +1086,7 @@ int __init dom0_construct_pvh(struct domain *d, const 
module_t *image,
 
     pvh_setup_mmcfg(d);
 
-    panic("Building a PVHv2 Dom0 is not yet supported.");
+    printk("WARNING: PVH is an experimental mode with limited 
functionality\n");
     return 0;
 }
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 3168973820..e27e86a514 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -584,6 +584,7 @@ int hvm_domain_initialise(struct domain *d, unsigned long 
domcr_flags,
     INIT_LIST_HEAD(&d->arch.hvm_domain.write_map.list);
     INIT_LIST_HEAD(&d->arch.hvm_domain.g2m_ioport_list);
     INIT_LIST_HEAD(&d->arch.hvm_domain.mmcfg_regions);
+    INIT_LIST_HEAD(&d->arch.hvm_domain.msix_tables);
 
     rc = create_perdomain_mapping(d, PERDOMAIN_VIRT_START, 0, NULL, NULL);
     if ( rc )
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index aea088e290..5c41eea0fe 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -643,17 +643,15 @@ static unsigned int msi_flags(uint16_t data, uint64_t 
addr)
            (trig_mode << GFLAGS_SHIFT_TRG_MODE);
 }
 
-void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct pci_dev *pdev,
-                        unsigned int entry, bool mask)
+static void vpci_mask_pirq(struct domain *d, int pirq, bool mask)
 {
-    struct domain *d = pdev->domain;
     const struct pirq *pinfo;
     struct irq_desc *desc;
     unsigned long flags;
     int irq;
 
-    ASSERT(arch->pirq >= 0);
-    pinfo = pirq_info(d, arch->pirq + entry);
+    ASSERT(pirq >= 0);
+    pinfo = pirq_info(d, pirq);
     if ( !pinfo )
         return;
 
@@ -670,6 +668,12 @@ void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct 
pci_dev *pdev,
     spin_unlock_irqrestore(&desc->lock, flags);
 }
 
+void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                        unsigned int entry, bool mask)
+{
+    vpci_mask_pirq(pdev->domain, arch->pirq + entry, mask);
+}
+
 int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
                          uint64_t address, uint32_t data, unsigned int vectors)
 {
@@ -778,3 +782,127 @@ void vpci_msi_arch_print(const struct vpci_arch_msi 
*arch, uint16_t data,
            MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK),
            arch->pirq);
 }
+
+void vpci_msix_arch_mask(struct vpci_arch_msix_entry *arch,
+                         struct pci_dev *pdev, bool mask)
+{
+    if ( arch->pirq == INVALID_PIRQ )
+        return;
+
+    vpci_mask_pirq(pdev->domain, arch->pirq, mask);
+}
+
+int vpci_msix_arch_enable(struct vpci_arch_msix_entry *arch,
+                          struct pci_dev *pdev, uint64_t address,
+                          uint32_t data, unsigned int entry_nr,
+                          paddr_t table_base)
+{
+    struct domain *d = pdev->domain;
+    struct msi_info msi_info = {
+        .seg = pdev->seg,
+        .bus = pdev->bus,
+        .devfn = pdev->devfn,
+        .table_base = table_base,
+        .entry_nr = entry_nr,
+    };
+    xen_domctl_bind_pt_irq_t bind = {
+        .irq_type = PT_IRQ_TYPE_MSI,
+        .u.msi.gvec = msi_vector(data),
+        .u.msi.gflags = msi_flags(data, address),
+    };
+    int rc;
+
+    ASSERT(arch->pirq == INVALID_PIRQ);
+
+    /*
+     * Simple sanity check before trying to setup the interrupt.
+     * According to the Intel SDM, bits [31, 20] must contain the
+     * value 0xfee. This avoids needlessly setting up pirqs for entries
+     * the guest has not actually configured.
+     */
+    if ( (address & 0xfff00000) != MSI_ADDR_HEADER )
+        return -EINVAL;
+
+    rc = allocate_and_map_msi_pirq(d, -1, &arch->pirq,
+                                   MAP_PIRQ_TYPE_MSI, &msi_info);
+    if ( rc )
+    {
+        gdprintk(XENLOG_ERR,
+                 "%04x:%02x:%02x.%u: unable to map MSI-X PIRQ entry %u: %d\n",
+                 pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                 PCI_FUNC(pdev->devfn), entry_nr, rc);
+        return rc;
+    }
+
+    bind.machine_irq = arch->pirq;
+    pcidevs_lock();
+    rc = pt_irq_create_bind(d, &bind);
+    if ( rc )
+    {
+        gdprintk(XENLOG_ERR,
+                 "%04x:%02x:%02x.%u: unable to create MSI-X bind %u: %d\n",
+                 pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                 PCI_FUNC(pdev->devfn), entry_nr, rc);
+        spin_lock(&d->event_lock);
+        unmap_domain_pirq(d, arch->pirq);
+        spin_unlock(&d->event_lock);
+        pcidevs_unlock();
+        arch->pirq = INVALID_PIRQ;
+        return rc;
+    }
+    pcidevs_unlock();
+
+    return 0;
+}
+
+int vpci_msix_arch_disable(struct vpci_arch_msix_entry *arch,
+                           struct pci_dev *pdev)
+{
+    struct domain *d = pdev->domain;
+    xen_domctl_bind_pt_irq_t bind = {
+        .irq_type = PT_IRQ_TYPE_MSI,
+        .machine_irq = arch->pirq,
+    };
+    int rc;
+
+    if ( arch->pirq == INVALID_PIRQ )
+        return 0;
+
+    pcidevs_lock();
+    rc = pt_irq_destroy_bind(d, &bind);
+    if ( rc )
+    {
+        pcidevs_unlock();
+        return rc;
+    }
+
+    spin_lock(&d->event_lock);
+    unmap_domain_pirq(d, arch->pirq);
+    spin_unlock(&d->event_lock);
+    pcidevs_unlock();
+
+    arch->pirq = INVALID_PIRQ;
+
+    return 0;
+}
+
+int vpci_msix_arch_init(struct vpci_arch_msix_entry *arch)
+{
+    arch->pirq = -1;
+    return 0;
+}
+
+void vpci_msix_arch_print(const struct vpci_arch_msix_entry *entry,
+                          uint32_t data, uint64_t addr, bool masked,
+                          unsigned int pos)
+{
+    printk("%4u vec=%#02x%7s%6s%3sassert%5s%7s dest_id=%lu mask=%u pirq: %d\n",
+           pos, MASK_EXTR(data, MSI_DATA_VECTOR_MASK),
+           data & MSI_DATA_DELIVERY_LOWPRI ? "lowest" : "fixed",
+           data & MSI_DATA_TRIGGER_LEVEL ? "level" : "edge",
+           data & MSI_DATA_LEVEL_ASSERT ? "" : "de",
+           addr & MSI_ADDR_DESTMODE_LOGIC ? "log" : "phys",
+           addr & MSI_ADDR_REDIRECTION_LOWPRI ? "lowest" : "fixed",
+           MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK),
+           masked, entry->pirq);
+}
diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
index 62cec9e82b..55d1bdfda0 100644
--- a/xen/drivers/vpci/Makefile
+++ b/xen/drivers/vpci/Makefile
@@ -1 +1 @@
-obj-y += vpci.o header.o msi.o
+obj-y += vpci.o header.o msi.o msix.o
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 1533c36470..effa830714 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -20,6 +20,7 @@
 #include <xen/sched.h>
 #include <xen/vpci.h>
 #include <xen/p2m-common.h>
+#include <asm/p2m.h>
 
 #define MAPPABLE_BAR(x)                                                 \
     ((x)->type == VPCI_BAR_MEM32 || (x)->type == VPCI_BAR_MEM64_LO ||   \
@@ -89,11 +90,45 @@ static int vpci_map_range(unsigned long s, unsigned long e, 
void *data)
     return modify_mmio(map->d, _gfn(s), _mfn(s), e - s + 1, map->map);
 }
 
+static int vpci_unmap_msix(struct domain *d, struct vpci_msix_mem *msix)
+{
+    unsigned long gfn;
+
+    for ( gfn = PFN_DOWN(msix->addr); gfn <= PFN_UP(msix->addr + msix->size);
+          gfn++ )
+    {
+        p2m_type_t t;
+        mfn_t mfn = get_gfn(d, gfn, &t);
+        int rc;
+
+        if ( mfn_eq(mfn, INVALID_MFN) )
+        {
+            /* Nothing to do, this is already a hole. */
+            put_gfn(d, gfn);
+            continue;
+        }
+
+        if ( !p2m_is_mmio(t) )
+        {
+            put_gfn(d, gfn);
+            return -EINVAL;
+        }
+
+        rc = modify_mmio(d, _gfn(gfn), mfn, 1, false);
+        put_gfn(d, gfn);
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
+
 static int vpci_modify_bar(struct domain *d, const struct vpci_bar *bar,
                            bool map)
 {
     struct rangeset *mem;
     struct map_data data = { .d = d, .map = map };
+    unsigned int i;
     int rc;
 
     ASSERT(MAPPABLE_BAR(bar));
@@ -102,6 +137,42 @@ static int vpci_modify_bar(struct domain *d, const struct 
vpci_bar *bar,
     if ( IS_ERR(mem) )
         return PTR_ERR(mem);
 
+    for ( i = 0; i < ARRAY_SIZE(bar->msix); i++ )
+    {
+        struct vpci_msix_mem *msix = bar->msix[i];
+
+        if ( !msix || msix->addr == INVALID_PADDR )
+            continue;
+
+        if ( map )
+        {
+            /*
+             * Make sure the MSI-X regions of the BAR are not mapped into the
+             * domain p2m, or else the MSI-X handlers are useless. Only do this
+             * when mapping, since that's when the memory decoding on the
+             * device is enabled.
+             *
+             * This is required because iommu_inclusive_mapping might have
+             * mapped MSI-X regions into the guest p2m.
+             */
+            rc = vpci_unmap_msix(d, msix);
+            if ( rc )
+            {
+                rangeset_destroy(mem);
+                return rc;
+            }
+        }
+
+        rc = rangeset_remove_range(mem, PFN_DOWN(msix->addr),
+                                   PFN_DOWN(msix->addr + msix->size));
+        if ( rc )
+        {
+            rangeset_destroy(mem);
+            return rc;
+        }
+
+    }
+
     rc = rangeset_report_ranges(mem, 0, ~0ul, vpci_map_range, &data);
     rangeset_destroy(mem);
     if ( rc )
@@ -212,6 +283,7 @@ static void vpci_bar_write(struct pci_dev *pdev, unsigned 
int reg,
     struct vpci_bar *bar = data;
     uint8_t seg = pdev->seg, bus = pdev->bus;
     uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+    unsigned int i;
     bool hi = false;
 
     if ( pci_conf_read16(seg, bus, slot, func, PCI_COMMAND) &
@@ -255,6 +327,11 @@ static void vpci_bar_write(struct pci_dev *pdev, unsigned 
int reg,
     bar->addr &= ~(0xffffffffull << (hi ? 32 : 0));
     bar->addr |= (uint64_t)val << (hi ? 32 : 0);
 
+    /* Update any MSI-X areas contained in this BAR. */
+    for ( i = 0; i < ARRAY_SIZE(bar->msix); i++ )
+        if ( bar->msix[i] )
+            bar->msix[i]->addr = bar->addr + bar->msix[i]->offset;
+
     /* Make sure Xen writes back the same value for the BAR RO bits. */
     if ( !hi )
         val |= pci_conf_read32(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
@@ -345,6 +422,7 @@ static int vpci_init_bars(struct pci_dev *pdev)
     {
         uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;
         uint32_t val = pci_conf_read32(seg, bus, slot, func, reg);
+        unsigned int j;
 
         if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
         {
@@ -386,6 +464,9 @@ static int vpci_init_bars(struct pci_dev *pdev)
         }
 
         bars[i].addr = addr;
+        for ( j = 0; j < ARRAY_SIZE(bars[i].msix); j++ )
+            if ( bars[i].msix[j] )
+                bars[i].msix[j]->addr = addr + bars[i].msix[j]->offset;
         bars[i].size = size;
         bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
 
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 181599241a..66e1bab6e8 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -324,7 +324,7 @@ void vpci_dump_msi(void)
         if ( !has_vpci(d) )
             continue;
 
-        printk("vPCI MSI information for d%d\n", d->domain_id);
+        printk("vPCI MSI/MSI-X information for d%d\n", d->domain_id);
 
         if ( !vpci_tryrlock(d) )
         {
@@ -337,10 +337,14 @@ void vpci_dump_msi(void)
             uint8_t seg = pdev->seg, bus = pdev->bus;
             uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
             const struct vpci_msi *msi = pdev->vpci->msi;
+            const struct vpci_msix *msix = pdev->vpci->msix;
+
+            if ( msi || msix )
+                printk("Device %04x:%02x:%02x.%u\n", seg, bus, slot, func);
 
             if ( msi )
             {
-                printk("Device %04x:%02x:%02x.%u\n", seg, bus, slot, func);
+                printk(" MSI\n");
 
                 printk("  Enabled: %u Supports masking: %u 64-bit addresses: 
%u\n",
                        msi->enabled, msi->masking, msi->address64);
@@ -352,6 +356,23 @@ void vpci_dump_msi(void)
                 if ( msi->masking )
                     printk("  mask=%08x\n", msi->mask);
             }
+
+            if ( msix )
+            {
+                unsigned int i;
+
+                printk(" MSI-X\n");
+
+                printk("  Max entries: %u maskall: %u enabled: %u\n",
+                       msix->max_entries, msix->masked, msix->enabled);
+
+                printk("  Table entries:\n");
+                for ( i = 0; i < msix->max_entries; i++ )
+                    vpci_msix_arch_print(&msix->entries[i].arch,
+                                         msix->entries[i].data,
+                                         msix->entries[i].addr,
+                                         msix->entries[i].masked, i);
+            }
         }
         vpci_runlock(d);
     }
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
new file mode 100644
index 0000000000..4035bea421
--- /dev/null
+++ b/xen/drivers/vpci/msix.c
@@ -0,0 +1,478 @@
+/*
+ * Handlers for accesses to the MSI-X capability structure and the memory
+ * region.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/sched.h>
+#include <xen/vpci.h>
+#include <asm/msi.h>
+#include <xen/p2m-common.h>
+#include <xen/keyhandler.h>
+
+#define MSIX_SIZE(num) offsetof(struct vpci_msix, entries[num])
+#define MSIX_ADDR_IN_RANGE(a, table)                                    \
+    ((table)->addr != INVALID_PADDR && (a) >= (table)->addr &&          \
+     (a) < (table)->addr + (table)->size)
+
+static uint32_t vpci_msix_control_read(struct pci_dev *pdev, unsigned int reg,
+                                       const void *data)
+{
+    const struct vpci_msix *msix = data;
+    uint16_t val;
+
+    val = (msix->max_entries - 1) & PCI_MSIX_FLAGS_QSIZE;
+    val |= msix->enabled ? PCI_MSIX_FLAGS_ENABLE : 0;
+    val |= msix->masked ? PCI_MSIX_FLAGS_MASKALL : 0;
+
+    return val;
+}
+
+static void vpci_msix_control_write(struct pci_dev *pdev, unsigned int reg,
+                                    uint32_t val, void *data)
+{
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+    struct vpci_msix *msix = data;
+    bool new_masked, new_enabled;
+
+    new_masked = val & PCI_MSIX_FLAGS_MASKALL;
+    new_enabled = val & PCI_MSIX_FLAGS_ENABLE;
+
+    /*
+     * According to the PCI 3.0 specification, switching the enable bit
+     * to 1 or the function mask bit to 0 should cause all the cached
+     * addresses and data fields to be recalculated. Xen implements this
+     * as disabling and enabling the entries.
+     *
+     * Note that the disable/enable sequence is only performed when the
+     * guest has written to the entry (ie: updated field set).
+     */
+    if ( new_enabled && !new_masked && (!msix->enabled || msix->masked) )
+    {
+        paddr_t table_base = pdev->vpci->header.bars[msix->table.bir].addr;
+        unsigned int i;
+        int rc;
+
+        for ( i = 0; i < msix->max_entries; i++ )
+        {
+            if ( msix->entries[i].masked || !msix->entries[i].updated )
+                continue;
+
+            rc = vpci_msix_arch_disable(&msix->entries[i].arch, pdev);
+            if ( rc )
+            {
+                gdprintk(XENLOG_ERR,
+                         "%04x:%02x:%02x.%u: unable to disable entry %u: %d\n",
+                         seg, bus, slot, func, msix->entries[i].nr, rc);
+                return;
+            }
+
+            rc = vpci_msix_arch_enable(&msix->entries[i].arch, pdev,
+                                       msix->entries[i].addr,
+                                       msix->entries[i].data,
+                                       msix->entries[i].nr, table_base);
+            if ( rc )
+            {
+                gdprintk(XENLOG_ERR,
+                         "%04x:%02x:%02x.%u: unable to enable entry %u: %d\n",
+                         seg, bus, slot, func, msix->entries[i].nr, rc);
+                /* Entry is likely not configured, skip it. */
+                continue;
+            }
+
+            /*
+             * At this point the PIRQ is still masked. Unmask it, or else the
+             * guest won't receive interrupts. This is due to the
+             * disable/enable sequence performed above.
+             */
+            vpci_msix_arch_mask(&msix->entries[i].arch, pdev, false);
+
+            msix->entries[i].updated = false;
+        }
+    }
+
+    if ( (new_enabled != msix->enabled || new_masked != msix->masked) &&
+         pci_msi_conf_write_intercept(pdev, reg, 2, &val) >= 0 )
+        pci_conf_write16(seg, bus, slot, func, reg, val);
+
+    msix->masked = new_masked;
+    msix->enabled = new_enabled;
+}
+
+static struct vpci_msix *vpci_msix_find(struct domain *d, unsigned long addr)
+{
+    struct vpci_msix *msix;
+
+    list_for_each_entry ( msix, &d->arch.hvm_domain.msix_tables, next )
+    {
+        const struct vpci_bar *bars = msix->pdev->vpci->header.bars;
+
+        if ( (bars[msix->table.bir].enabled &&
+              MSIX_ADDR_IN_RANGE(addr, &msix->table)) ||
+             (bars[msix->pba.bir].enabled &&
+              MSIX_ADDR_IN_RANGE(addr, &msix->pba)) )
+            return msix;
+    }
+
+    return NULL;
+}
+
+static int vpci_msix_accept(struct vcpu *v, unsigned long addr)
+{
+    bool found;
+
+    vpci_rlock(v->domain);
+    found = vpci_msix_find(v->domain, addr);
+    vpci_runlock(v->domain);
+
+    return found;
+}
+
+static int vpci_msix_access_check(struct pci_dev *pdev, unsigned long addr,
+                                  unsigned int len)
+{
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+
+    /* Only allow 32/64b accesses. */
+    if ( len != 4 && len != 8 )
+    {
+        gdprintk(XENLOG_ERR,
+                 "%04x:%02x:%02x.%u: invalid MSI-X table access size: %u\n",
+                 seg, bus, slot, func, len);
+        return -EINVAL;
+    }
+
+    /* Only allow aligned accesses. */
+    if ( (addr & (len - 1)) != 0 )
+    {
+        gdprintk(XENLOG_ERR,
+                 "%04x:%02x:%02x.%u: MSI-X only allows aligned accesses\n",
+                 seg, bus, slot, func);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static struct vpci_msix_entry *vpci_msix_get_entry(struct vpci_msix *msix,
+                                                   unsigned long addr)
+{
+    return &msix->entries[(addr - msix->table.addr) / PCI_MSIX_ENTRY_SIZE];
+}
+
+static int vpci_msix_read(struct vcpu *v, unsigned long addr,
+                          unsigned int len, unsigned long *data)
+{
+    struct domain *d = v->domain;
+    struct vpci_msix *msix;
+    const struct vpci_msix_entry *entry;
+    unsigned int offset;
+
+    vpci_rlock(d);
+    msix = vpci_msix_find(d, addr);
+    if ( !msix )
+    {
+        vpci_runlock(d);
+        *data = ~0ul;
+        return X86EMUL_OKAY;
+    }
+
+    if ( vpci_msix_access_check(msix->pdev, addr, len) )
+    {
+        vpci_runlock(d);
+        *data = ~0ul;
+        return X86EMUL_OKAY;
+    }
+
+    if ( MSIX_ADDR_IN_RANGE(addr, &msix->pba) )
+    {
+        /* Access to PBA. */
+        switch ( len )
+        {
+        case 4:
+            *data = readl(addr);
+            break;
+        case 8:
+            *data = readq(addr);
+            break;
+        default:
+            ASSERT_UNREACHABLE();
+            *data = ~0ul;
+            break;
+        }
+
+        vpci_runlock(d);
+        return X86EMUL_OKAY;
+    }
+
+    entry = vpci_msix_get_entry(msix, addr);
+    offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
+
+    switch ( offset )
+    {
+    case PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET:
+        /*
+         * NB: do explicit truncation to the size of the access. This shouldn't
+         * be required here, since the caller of the handler should already
+         * take the appropriate measures to truncate the value before returning
+         * to the guest, but better be safe than sorry.
+         */
+        *data = len == 8 ? entry->addr : (uint32_t)entry->addr;
+        break;
+    case PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET:
+        *data = entry->addr >> 32;
+        break;
+    case PCI_MSIX_ENTRY_DATA_OFFSET:
+        *data = entry->data;
+        if ( len == 8 )
+            *data |=
+                (uint64_t)(entry->masked ? PCI_MSIX_VECTOR_BITMASK : 0) << 32;
+        break;
+    case PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET:
+        *data = entry->masked ? PCI_MSIX_VECTOR_BITMASK : 0;
+        break;
+    default:
+        ASSERT_UNREACHABLE();
+        *data = ~0ul;
+        break;
+    }
+    vpci_runlock(d);
+
+    return X86EMUL_OKAY;
+}
+
+static int vpci_msix_write(struct vcpu *v, unsigned long addr,
+                           unsigned int len, unsigned long data)
+{
+    struct domain *d = v->domain;
+    struct vpci_msix *msix;
+    struct vpci_msix_entry *entry;
+    unsigned int offset;
+
+    vpci_wlock(d);
+    msix = vpci_msix_find(d, addr);
+    if ( !msix )
+    {
+        vpci_wunlock(d);
+        return X86EMUL_OKAY;
+    }
+
+    if ( MSIX_ADDR_IN_RANGE(addr, &msix->pba) )
+    {
+        /* Ignore writes to PBA, it's behavior is undefined. */
+        vpci_wunlock(d);
+        return X86EMUL_OKAY;
+    }
+
+    if ( vpci_msix_access_check(msix->pdev, addr, len) )
+    {
+        vpci_wunlock(d);
+        return X86EMUL_OKAY;
+    }
+
+    entry = vpci_msix_get_entry(msix, addr);
+    offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
+
+    /*
+     * NB: Xen allows writes to the data/address registers with the entry
+     * unmasked. The specification says this is undefined behavior, and Xen
+     * implements it as storing the written value, which will be made effective
+     * in the next mask/unmask cycle. This also mimics the implementation in
+     * QEMU.
+     */
+    switch ( offset )
+    {
+    case PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET:
+        entry->updated = true;
+        if ( len == 8 )
+        {
+            entry->addr = data;
+            break;
+        }
+        entry->addr &= ~0xffffffff;
+        entry->addr |= data;
+        break;
+    case PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET:
+        entry->updated = true;
+        entry->addr &= 0xffffffff;
+        entry->addr |= (uint64_t)data << 32;
+        break;
+    case PCI_MSIX_ENTRY_DATA_OFFSET:
+        entry->updated = true;
+        entry->data = data;
+
+        if ( len == 4 )
+            break;
+
+        data >>= 32;
+        /* fallthrough */
+    case PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET:
+    {
+        bool new_masked = data & PCI_MSIX_VECTOR_BITMASK;
+        struct pci_dev *pdev = msix->pdev;
+        paddr_t table_base = pdev->vpci->header.bars[msix->table.bir].addr;
+        int rc;
+
+        if ( entry->masked == new_masked )
+            /* No change in the mask bit, nothing to do. */
+            break;
+
+        if ( !new_masked && msix->enabled && !msix->masked && entry->updated )
+        {
+            /*
+             * If MSI-X is enabled, the function mask is not active, the entry
+             * is being unmasked and there have been changes to the address or
+             * data fields Xen needs to disable and enable the entry in order
+             * to pick up the changes.
+             */
+            rc = vpci_msix_arch_disable(&entry->arch, pdev);
+            if ( rc )
+            {
+                gdprintk(XENLOG_ERR,
+                         "%04x:%02x:%02x.%u: unable to disable entry %u: %d\n",
+                         pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                         PCI_FUNC(pdev->devfn), entry->nr, rc);
+                break;
+            }
+
+            rc = vpci_msix_arch_enable(&entry->arch, pdev, entry->addr,
+                                       entry->data, entry->nr, table_base);
+            if ( rc )
+            {
+                gdprintk(XENLOG_ERR,
+                         "%04x:%02x:%02x.%u: unable to enable entry %u: %d\n",
+                         pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                         PCI_FUNC(pdev->devfn), entry->nr, rc);
+                break;
+            }
+            entry->updated = false;
+        }
+
+        vpci_msix_arch_mask(&entry->arch, pdev, new_masked);
+        entry->masked = new_masked;
+
+        break;
+    }
+    default:
+        ASSERT_UNREACHABLE();
+        break;
+    }
+    vpci_wunlock(d);
+
+    return X86EMUL_OKAY;
+}
+
+static const struct hvm_mmio_ops vpci_msix_table_ops = {
+    .check = vpci_msix_accept,
+    .read = vpci_msix_read,
+    .write = vpci_msix_write,
+};
+
+static int vpci_init_msix(struct pci_dev *pdev)
+{
+    struct domain *d = pdev->domain;
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+    struct vpci_msix *msix;
+    unsigned int msix_offset, i, max_entries;
+    uint16_t control;
+    int rc;
+
+    msix_offset = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSIX);
+    if ( !msix_offset )
+        return 0;
+
+    control = pci_conf_read16(seg, bus, slot, func,
+                              msix_control_reg(msix_offset));
+
+    max_entries = msix_table_size(control);
+
+    msix = xzalloc_bytes(MSIX_SIZE(max_entries));
+    if ( !msix )
+        return -ENOMEM;
+
+    msix->max_entries = max_entries;
+    msix->pdev = pdev;
+
+    /* Find the MSI-X table address. */
+    msix->table.offset = pci_conf_read32(seg, bus, slot, func,
+                                         msix_table_offset_reg(msix_offset));
+    msix->table.bir = msix->table.offset & PCI_MSIX_BIRMASK;
+    msix->table.offset &= ~PCI_MSIX_BIRMASK;
+    msix->table.size = msix->max_entries * PCI_MSIX_ENTRY_SIZE;
+    /*
+     * The PCI header initialization code will take care of setting the address
+     * of both the table and pba memory regions once the BARs have been
+     * sized.
+     */
+    msix->table.addr = INVALID_PADDR;
+
+    /* Find the MSI-X pba address. */
+    msix->pba.offset = pci_conf_read32(seg, bus, slot, func,
+                                       msix_pba_offset_reg(msix_offset));
+    msix->pba.bir = msix->pba.offset & PCI_MSIX_BIRMASK;
+    msix->pba.offset &= ~PCI_MSIX_BIRMASK;
+    /*
+     * The spec mentions regarding to the PBA that "The last QWORD will not
+     * necessarily be fully populated", so it implies that the PBA size is
+     * 64-bit aligned.
+     */
+    msix->pba.size = ROUNDUP(DIV_ROUND_UP(msix->max_entries, 8), 8);
+    msix->pba.addr = INVALID_PADDR;
+
+    for ( i = 0; i < msix->max_entries; i++)
+    {
+        msix->entries[i].masked = true;
+        msix->entries[i].nr = i;
+        vpci_msix_arch_init(&msix->entries[i].arch);
+    }
+
+    if ( list_empty(&d->arch.hvm_domain.msix_tables) )
+        register_mmio_handler(d, &vpci_msix_table_ops);
+
+    list_add(&msix->next, &d->arch.hvm_domain.msix_tables);
+
+    rc = vpci_add_register(pdev, vpci_msix_control_read,
+                           vpci_msix_control_write,
+                           msix_control_reg(msix_offset), 2, msix);
+    if ( rc )
+    {
+        xfree(msix);
+        return rc;
+    }
+
+    pdev->vpci->header.bars[msix->table.bir].msix[VPCI_BAR_MSIX_TABLE] =
+        &msix->table;
+    pdev->vpci->header.bars[msix->pba.bir].msix[VPCI_BAR_MSIX_PBA] =
+        &msix->pba;
+    pdev->vpci->msix = msix;
+
+    return 0;
+}
+
+REGISTER_VPCI_INIT(vpci_init_msix, VPCI_PRIORITY_HIGH);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index e8dc01bc3e..3be62f3589 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -190,6 +190,9 @@ struct hvm_domain {
     /* List of MMCFG regions trapped by Xen. */
     struct list_head mmcfg_regions;
 
+    /* List of MSI-X tables. */
+    struct list_head msix_tables;
+
     /* List of permanently write-mapped pages. */
     struct {
         spinlock_t lock;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index b6c5e30b6a..de2e451b4f 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -144,6 +144,25 @@ void vpci_msi_arch_init(struct vpci_arch_msi *arch);
 void vpci_msi_arch_print(const struct vpci_arch_msi *arch, uint16_t data,
                          uint64_t addr);
 
+/* Arch-specific MSI-X entry data for vPCI. */
+struct vpci_arch_msix_entry {
+    int pirq;
+};
+
+/* Arch-specific vPCI MSI-X helpers. */
+void vpci_msix_arch_mask(struct vpci_arch_msix_entry *arch,
+                         struct pci_dev *pdev, bool mask);
+int vpci_msix_arch_enable(struct vpci_arch_msix_entry *arch,
+                          struct pci_dev *pdev, uint64_t address,
+                          uint32_t data, unsigned int entry_nr,
+                          paddr_t table_base);
+int vpci_msix_arch_disable(struct vpci_arch_msix_entry *arch,
+                           struct pci_dev *pdev);
+int vpci_msix_arch_init(struct vpci_arch_msix_entry *arch);
+void vpci_msix_arch_print(const struct vpci_arch_msix_entry *entry,
+                          uint32_t data, uint64_t addr, bool masked,
+                          unsigned int pos);
+
 enum stdvga_cache_state {
     STDVGA_CACHE_UNINITIALIZED,
     STDVGA_CACHE_ENABLED,
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 66d8ae8b5f..5d053b12fd 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -78,6 +78,10 @@ struct vpci {
         struct vpci_bar {
             paddr_t addr;
             uint64_t size;
+#define VPCI_BAR_MSIX_TABLE     0
+#define VPCI_BAR_MSIX_PBA       1
+#define VPCI_BAR_MSIX_NUM       2
+            struct vpci_msix_mem *msix[VPCI_BAR_MSIX_NUM];
             enum {
                 VPCI_BAR_EMPTY,
                 VPCI_BAR_IO,
@@ -122,6 +126,41 @@ struct vpci {
         /* 64-bit address capable? */
         bool address64;
     } *msi;
+
+    /* MSI-X data. */
+    struct vpci_msix {
+        struct pci_dev *pdev;
+        /* List link. */
+        struct list_head next;
+        /* Table information. */
+        struct vpci_msix_mem {
+            /* MSI-X table offset. */
+            unsigned int offset;
+            /* MSI-X table BIR. */
+            unsigned int bir;
+            /* Table addr. */
+            paddr_t addr;
+            /* Table size. */
+            unsigned int size;
+        } table;
+        /* PBA */
+        struct vpci_msix_mem pba;
+        /* Maximum number of vectors supported by the device. */
+        unsigned int max_entries;
+        /* MSI-X enabled? */
+        bool enabled;
+        /* Masked? */
+        bool masked;
+        /* Entries. */
+        struct vpci_msix_entry {
+            uint64_t addr;
+            uint32_t data;
+            unsigned int nr;
+            struct vpci_arch_msix_entry arch;
+            bool masked;
+            bool updated;
+        } entries[];
+    } *msix;
 #endif
 };
 
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.