[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v10 01/11] vpci: introduce basic handlers to trap accesses to the PCI config space



This functionality is going to reside in vpci.c (and the corresponding
vpci.h header), and should be arch-agnostic. The handlers introduced
in this patch setup the basic functionality required in order to trap
accesses to the PCI config space, and allow decoding the address and
finding the corresponding handler that should handle the access
(although no handlers are implemented).

Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are
setup inside of a x86 HVM file, since that's not shared with other
arches.

A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen
whether a domain should use the newly introduced vPCI handlers, this
is only enabled for PVH Dom0 at the moment.

A very simple user-space test is also provided, so that the basic
functionality of the vPCI traps can be asserted. This has been proven
quite helpful during development, since the logic to handle partial
accesses or accesses that expand across multiple registers is not
trivial.

The handlers for the registers are added to a linked list that's keep
sorted at all times. Both the read and write handlers support accesses
that expand across multiple emulated registers and contain gaps not
emulated.

Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
[IO parts]
Reviewed-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
---
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
Cc: Jan Beulich <jbeulich@xxxxxxxx>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
Cc: Tim Deegan <tim@xxxxxxx>
Cc: Wei Liu <wei.liu2@xxxxxxxxxx>
Cc: Julien Grall <julien.grall@xxxxxxx>
Cc: Paul Durrant <paul.durrant@xxxxxxxxxx>
---
Changes since v9:
 - Remove vpci/Kconfig and use drivers/Kconfig instead.
 - Remove depends on HAS_PCI.

Changes since v8:
 - Introduce HAS_VPCI Kconfig option.
 - Drop Jan and Wei's RB (keep Paul's since the HAS_VPCI addition
   doesn't change IO code).
 - Rebase on top of XSA-256.

Changes since v7:
 - Constify d in vpci_portio_read.
 - ASSERT the correctness of the address in the read/write handlers.
 - Add newlines between non-fallthrough case statements.

Changes since v6:
 - Align the vpci handlers in the linker script.
 - Switch add/remove register functions to take a vpci parameter
   instead of a pci_dev.
 - Expand comment of merge_result.
 - Return X86EMUL_UNHANDLEABLE if accessing cfc and cf8 is disabled.

Changes since v5:
 - Use a spinlock per pci device.
 - Use the recently introduced pci_sbdf_t type.
 - Fix test harness to use the right handler type and the newly
   introduced lock.
 - Move the position of the vpci sections in the linker scripts.
 - Constify domain and pci_dev in vpci_{read/write}.
 - Fix typos in comments.
 - Use _XEN_VPCI_H_ as header guard.

Changes since v4:
* User-space test harness:
 - Do not redirect the output of the test.
 - Add main.c and emul.h as dependencies of the Makefile target.
 - Use the same rule to modify the vpci and list headers.
 - Remove underscores from local macro variables.
 - Add _check suffix to the test harness multiread function.
 - Change the value written by every different size in the multiwrite
   test.
 - Use { } to initialize the r16 and r20 arrays (instead of { 0 }).
 - Perform some of the read checks with the local variable directly.
 - Expand some comments.
 - Implement a dummy rwlock.
* Hypervisor code:
 - Guard the linker script changes with CONFIG_HAS_PCI.
 - Rename vpci_access_check to vpci_access_allowed and make it return
   bool.
 - Make hvm_pci_decode_addr return the register as return value.
 - Use ~3 instead of 0xfffc to remove the register offset when
   checking accesses to IO ports.
 - s/head/prev in vpci_add_register.
 - Add parentheses around & in vpci_add_register.
 - Fix register removal.
 - Change the BUGs in vpci_{read/write}_hw helpers to
   ASSERT_UNREACHABLE.
 - Make merge_result static and change the computation of the mask to
   avoid using a uint64_t.
 - Modify vpci_read to only read from hardware the not-emulated gaps.
 - Remove the vpci_val union and use a uint32_t instead.
 - Change handler read type to return a uint32_t instead of modifying
   a variable passed by reference.
 - Constify the data opaque parameter of read handlers.
 - Change the size parameter of the vpci_{read/write} functions to
   unsigned int.
 - Place the array of initialization handlers in init.rodata or
   .rodata depending on whether late-hwdom is enabled.
 - Remove the pci_devs lock, assume the Dom0 is well behaved and won't
   remove the device while trying to access it.
 - Change the recursive spinlock into a rw lock for performance
   reasons.

Changes since v3:
* User-space test harness:
 - Fix spaces in container_of macro.
 - Implement a dummy locking functions.
 - Remove 'current' macro make current a pointer to the statically
   allocated vpcu.
 - Remove unneeded parentheses in the pci_conf_readX macros.
 - Fix the name of the write test macro.
 - Remove the dummy EXPORT_SYMBOL macro (this was needed by the RB
   code only).
 - Import the max macro.
 - Test all possible read/write size combinations with all possible
   emulated register sizes.
 - Introduce a test for register removal.
* Hypervisor code:
 - Use a sorted list in order to store the config space handlers.
 - Remove some unneeded 'else' branches.
 - Make the IO port handlers always return X86EMUL_OKAY, and set the
   data to all 1's in case of read failure (write are simply ignored).
 - In hvm_select_ioreq_server reuse local variables when calling
   XEN_DMOP_PCI_SBDF.
 - Store the pointers to the initialization functions in the .rodata
   section.
 - Do not ignore the return value of xen_vpci_add_handlers in
   setup_one_hwdom_device.
 - Remove the vpci_init macro.
 - Do not hide the pointers inside of the vpci_{read/write}_t
   typedefs.
 - Rename priv_data to private in vpci_register.
 - Simplify checking for register overlap in vpci_register_cmp.
 - Check that the offset and the length match before removing a
   register in xen_vpci_remove_register.
 - Make vpci_read_hw return a value rather than storing it in a
   pointer passed by parameter.
 - Handler dispatcher functions vpci_{read/write} no longer return an
   error code, errors on reads/writes should be treated like hardware
   (writes ignored, reads return all 1's or garbage).
 - Make sure pcidevs is locked before calling pci_get_pdev_by_domain.
 - Use a recursive spinlock for the vpci lock, so that spin_is_locked
   checks that the current CPU is holding the lock.
 - Make the code less error-chatty by removing some of the printk's.
 - Pass the slot and the function as separate parameters to the
   handler dispatchers (instead of passing devfn).
 - Allow handlers to be registered with either a read or write
   function only, the missing handler will be replaced by a dummy
   handler (writes ignored, reads return 1's).
 - Introduce PCI_CFG_SPACE_* defines from Linux.
 - Simplify the handler dispatchers by removing the recursion, now the
   dispatchers iterate over the list of sorted handlers and call them
   in order.
 - Remove the GENMASK_BYTES, SHIFT_RIGHT_BYTES and ADD_RESULT macros,
   and instead provide a merge_result function in order to merge a
   register output into a partial result.
 - Rename the fields of the vpci_val union to u8/u16/u32.
 - Remove the return values from the read/write handlers, errors
   should be handled internally and signaled as would be done on
   native hardware.
 - Remove the usage of the GENMASK macro.

Changes since v2:
 - Generalize the PCI address decoding and use it for IOREQ code also.

Changes since v1:
 - Allow access to cross a word-boundary.
 - Add locking.
 - Add cleanup to xen_vpci_add_handlers in case of failure.
---
 .gitignore                        |   3 +
 tools/libxl/libxl_x86.c           |   2 +-
 tools/tests/Makefile              |   1 +
 tools/tests/vpci/Makefile         |  37 +++
 tools/tests/vpci/emul.h           | 133 +++++++++++
 tools/tests/vpci/main.c           | 309 +++++++++++++++++++++++++
 xen/arch/arm/xen.lds.S            |  14 ++
 xen/arch/x86/Kconfig              |   1 +
 xen/arch/x86/domain.c             |   6 +-
 xen/arch/x86/hvm/hvm.c            |   2 +
 xen/arch/x86/hvm/io.c             | 105 +++++++++
 xen/arch/x86/setup.c              |   3 +-
 xen/arch/x86/xen.lds.S            |  14 ++
 xen/drivers/Kconfig               |   3 +
 xen/drivers/Makefile              |   1 +
 xen/drivers/passthrough/pci.c     |  10 +-
 xen/drivers/vpci/Makefile         |   1 +
 xen/drivers/vpci/vpci.c           | 459 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/domain.h      |   1 +
 xen/include/asm-x86/hvm/io.h      |   3 +
 xen/include/public/arch-x86/xen.h |   5 +-
 xen/include/xen/pci.h             |   3 +
 xen/include/xen/pci_regs.h        |   8 +
 xen/include/xen/vpci.h            |  53 +++++
 24 files changed, 1170 insertions(+), 7 deletions(-)
 create mode 100644 tools/tests/vpci/Makefile
 create mode 100644 tools/tests/vpci/emul.h
 create mode 100644 tools/tests/vpci/main.c
 create mode 100644 xen/drivers/vpci/Makefile
 create mode 100644 xen/drivers/vpci/vpci.c
 create mode 100644 xen/include/xen/vpci.h

diff --git a/.gitignore b/.gitignore
index 7820abb756..cd57530cba 100644
--- a/.gitignore
+++ b/.gitignore
@@ -254,6 +254,9 @@ tools/tests/regression/build/*
 tools/tests/regression/downloads/*
 tools/tests/mem-sharing/memshrtool
 tools/tests/mce-test/tools/xen-mceinj
+tools/tests/vpci/list.h
+tools/tests/vpci/vpci.[hc]
+tools/tests/vpci/test_vpci
 tools/xcutils/lsevtchn
 tools/xcutils/readnotes
 tools/xenbackendd/_paths.h
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 4ea1249925..1e9f98961b 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -9,7 +9,7 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
 {
     switch(d_config->c_info.type) {
     case LIBXL_DOMAIN_TYPE_HVM:
-        xc_config->emulation_flags = XEN_X86_EMU_ALL;
+        xc_config->emulation_flags = (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI);
         break;
     case LIBXL_DOMAIN_TYPE_PVH:
         xc_config->emulation_flags = XEN_X86_EMU_LAPIC;
diff --git a/tools/tests/Makefile b/tools/tests/Makefile
index 7162945121..f6942a93fb 100644
--- a/tools/tests/Makefile
+++ b/tools/tests/Makefile
@@ -13,6 +13,7 @@ endif
 SUBDIRS-$(CONFIG_X86) += x86_emulator
 SUBDIRS-y += xen-access
 SUBDIRS-y += xenstore
+SUBDIRS-$(CONFIG_HAS_PCI) += vpci
 
 .PHONY: all clean install distclean uninstall
 all clean distclean: %: subdirs-%
diff --git a/tools/tests/vpci/Makefile b/tools/tests/vpci/Makefile
new file mode 100644
index 0000000000..e45fcb5cd9
--- /dev/null
+++ b/tools/tests/vpci/Makefile
@@ -0,0 +1,37 @@
+XEN_ROOT=$(CURDIR)/../../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+TARGET := test_vpci
+
+.PHONY: all
+all: $(TARGET)
+
+.PHONY: run
+run: $(TARGET)
+       ./$(TARGET)
+
+$(TARGET): vpci.c vpci.h list.h main.c emul.h
+       $(HOSTCC) -g -o $@ vpci.c main.c
+
+.PHONY: clean
+clean:
+       rm -rf $(TARGET) *.o *~ vpci.h vpci.c list.h
+
+.PHONY: distclean
+distclean: clean
+
+.PHONY: install
+install:
+
+vpci.c: $(XEN_ROOT)/xen/drivers/vpci/vpci.c
+       # Trick the compiler so it doesn't complain about missing symbols
+       sed -e '/#include/d' \
+           -e '1s;^;#include "emul.h"\
+                    vpci_register_init_t *const __start_vpci_array[1]\;\
+                    vpci_register_init_t *const __end_vpci_array[1]\;\
+                    ;' <$< >$@
+
+list.h: $(XEN_ROOT)/xen/include/xen/list.h
+vpci.h: $(XEN_ROOT)/xen/include/xen/vpci.h
+list.h vpci.h:
+       sed -e '/#include/d' <$< >$@
diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
new file mode 100644
index 0000000000..fd0317995a
--- /dev/null
+++ b/tools/tests/vpci/emul.h
@@ -0,0 +1,133 @@
+/*
+ * Unit tests for the generic vPCI handler code.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _TEST_VPCI_
+#define _TEST_VPCI_
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#define container_of(ptr, type, member) ({                      \
+        typeof(((type *)0)->member) *mptr = (ptr);              \
+                                                                \
+        (type *)((char *)mptr - offsetof(type, member));        \
+})
+
+#define smp_wmb()
+#define prefetch(x) __builtin_prefetch(x)
+#define ASSERT(x) assert(x)
+#define __must_check __attribute__((__warn_unused_result__))
+
+#include "list.h"
+
+struct domain {
+};
+
+struct pci_dev {
+    struct vpci *vpci;
+};
+
+struct vcpu
+{
+    const struct domain *domain;
+};
+
+extern const struct vcpu *current;
+extern const struct pci_dev test_pdev;
+
+typedef bool spinlock_t;
+#define spin_lock_init(l) (*(l) = false)
+#define spin_lock(l) (*(l) = true)
+#define spin_unlock(l) (*(l) = false)
+
+typedef union {
+    uint32_t sbdf;
+    struct {
+        union {
+            uint16_t bdf;
+            struct {
+                union {
+                    struct {
+                        uint8_t func : 3,
+                                dev  : 5;
+                    };
+                    uint8_t     extfunc;
+                };
+                uint8_t         bus;
+            };
+        };
+        uint16_t                seg;
+    };
+} pci_sbdf_t;
+
+#include "vpci.h"
+
+#define __hwdom_init
+
+#define has_vpci(d) true
+
+#define xzalloc(type) ((type *)calloc(1, sizeof(type)))
+#define xmalloc(type) ((type *)malloc(sizeof(type)))
+#define xfree(p) free(p)
+
+#define pci_get_pdev_by_domain(...) &test_pdev
+
+/* Dummy native helpers. Writes are ignored, reads return 1's. */
+#define pci_conf_read8(...)     0xff
+#define pci_conf_read16(...)    0xffff
+#define pci_conf_read32(...)    0xffffffff
+#define pci_conf_write8(...)
+#define pci_conf_write16(...)
+#define pci_conf_write32(...)
+
+#define PCI_CFG_SPACE_EXP_SIZE 4096
+
+#define BUG() assert(0)
+#define ASSERT_UNREACHABLE() assert(0)
+
+#define min(x, y) ({                    \
+        const typeof(x) tx = (x);       \
+        const typeof(y) ty = (y);       \
+                                        \
+        (void) (&tx == &ty);            \
+        tx < ty ? tx : ty;              \
+})
+
+#define max(x, y) ({                    \
+        const typeof(x) tx = (x);       \
+        const typeof(y) ty = (y);       \
+                                        \
+        (void) (&tx == &ty);            \
+        tx > ty ? tx : ty;              \
+})
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
new file mode 100644
index 0000000000..b9a0a6006b
--- /dev/null
+++ b/tools/tests/vpci/main.c
@@ -0,0 +1,309 @@
+/*
+ * Unit tests for the generic vPCI handler code.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "emul.h"
+
+/* Single vcpu (current), and single domain with a single PCI device. */
+static struct vpci vpci;
+
+const static struct domain d;
+
+const struct pci_dev test_pdev = {
+    .vpci = &vpci,
+};
+
+const static struct vcpu v = {
+    .domain = &d
+};
+
+const struct vcpu *current = &v;
+
+/* Dummy hooks, write stores data, read fetches it. */
+static uint32_t vpci_read8(const struct pci_dev *pdev, unsigned int reg,
+                           void *data)
+{
+    return *(uint8_t *)data;
+}
+
+static void vpci_write8(const struct pci_dev *pdev, unsigned int reg,
+                        uint32_t val, void *data)
+{
+    *(uint8_t *)data = val;
+}
+
+static uint32_t vpci_read16(const struct pci_dev *pdev, unsigned int reg,
+                            void *data)
+{
+    return *(uint16_t *)data;
+}
+
+static void vpci_write16(const struct pci_dev *pdev, unsigned int reg,
+                         uint32_t val, void *data)
+{
+    *(uint16_t *)data = val;
+}
+
+static uint32_t vpci_read32(const struct pci_dev *pdev, unsigned int reg,
+                            void *data)
+{
+    return *(uint32_t *)data;
+}
+
+static void vpci_write32(const struct pci_dev *pdev, unsigned int reg,
+                         uint32_t val, void *data)
+{
+    *(uint32_t *)data = val;
+}
+
+#define VPCI_READ(reg, size, data) ({                           \
+    data = vpci_read((pci_sbdf_t){ .sbdf = 0 }, reg, size);     \
+})
+
+#define VPCI_READ_CHECK(reg, size, expected) ({                 \
+    uint32_t rd;                                                \
+                                                                \
+    VPCI_READ(reg, size, rd);                                   \
+    assert(rd == (expected));                                   \
+})
+
+#define VPCI_WRITE(reg, size, data) ({                          \
+    vpci_write((pci_sbdf_t){ .sbdf = 0 }, reg, size, data);     \
+})
+
+#define VPCI_WRITE_CHECK(reg, size, data) ({                    \
+    VPCI_WRITE(reg, size, data);                                \
+    VPCI_READ_CHECK(reg, size, data);                           \
+})
+
+#define VPCI_ADD_REG(fread, fwrite, off, size, store)                       \
+    assert(!vpci_add_register(test_pdev.vpci, fread, fwrite, off, size,     \
+                              &store))
+
+#define VPCI_ADD_INVALID_REG(fread, fwrite, off, size)                      \
+    assert(vpci_add_register(test_pdev.vpci, fread, fwrite, off, size, NULL))
+
+#define VPCI_REMOVE_REG(off, size)                                          \
+    assert(!vpci_remove_register(test_pdev.vpci, off, size))
+
+#define VPCI_REMOVE_INVALID_REG(off, size)                                  \
+    assert(vpci_remove_register(test_pdev.vpci, off, size))
+
+/* Read a 32b register using all possible sizes. */
+void multiread4_check(unsigned int reg, uint32_t val)
+{
+    unsigned int i;
+
+    /* Read using bytes. */
+    for ( i = 0; i < 4; i++ )
+        VPCI_READ_CHECK(reg + i, 1, (val >> (i * 8)) & UINT8_MAX);
+
+    /* Read using 2bytes. */
+    for ( i = 0; i < 2; i++ )
+        VPCI_READ_CHECK(reg + i * 2, 2, (val >> (i * 2 * 8)) & UINT16_MAX);
+
+    VPCI_READ_CHECK(reg, 4, val);
+}
+
+void multiwrite4_check(unsigned int reg)
+{
+    unsigned int i;
+    uint32_t val = 0xa2f51732;
+
+    /* Write using bytes. */
+    for ( i = 0; i < 4; i++ )
+        VPCI_WRITE_CHECK(reg + i, 1, (val >> (i * 8)) & UINT8_MAX);
+    multiread4_check(reg, val);
+
+    /* Change the value each time to be sure writes work fine. */
+    val = 0x2b836fda;
+    /* Write using 2bytes. */
+    for ( i = 0; i < 2; i++ )
+        VPCI_WRITE_CHECK(reg + i * 2, 2, (val >> (i * 2 * 8)) & UINT16_MAX);
+    multiread4_check(reg, val);
+
+    val = 0xc4693beb;
+    VPCI_WRITE_CHECK(reg, 4, val);
+    multiread4_check(reg, val);
+}
+
+int
+main(int argc, char **argv)
+{
+    /* Index storage by offset. */
+    uint32_t r0 = 0xdeadbeef;
+    uint8_t r5 = 0xef;
+    uint8_t r6 = 0xbe;
+    uint8_t r7 = 0xef;
+    uint16_t r12 = 0x8696;
+    uint8_t r16[4] = { };
+    uint16_t r20[2] = { };
+    uint32_t r24 = 0;
+    uint8_t r28, r30;
+    unsigned int i;
+    int rc;
+
+    INIT_LIST_HEAD(&vpci.handlers);
+    spin_lock_init(&vpci.lock);
+
+    VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
+    VPCI_READ_CHECK(0, 4, r0);
+    VPCI_WRITE_CHECK(0, 4, 0xbcbcbcbc);
+
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 5, 1, r5);
+    VPCI_READ_CHECK(5, 1, r5);
+    VPCI_WRITE_CHECK(5, 1, 0xba);
+
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 6, 1, r6);
+    VPCI_READ_CHECK(6, 1, r6);
+    VPCI_WRITE_CHECK(6, 1, 0xba);
+
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 7, 1, r7);
+    VPCI_READ_CHECK(7, 1, r7);
+    VPCI_WRITE_CHECK(7, 1, 0xbd);
+
+    VPCI_ADD_REG(vpci_read16, vpci_write16, 12, 2, r12);
+    VPCI_READ_CHECK(12, 2, r12);
+    VPCI_READ_CHECK(12, 4, 0xffff8696);
+
+    /*
+     * At this point we have the following layout:
+     *
+     * Note that this refers to the position of the variables,
+     * but the value has already changed from the one given at
+     * initialization time because write tests have been performed.
+     *
+     * 32    24    16     8     0
+     *  +-----+-----+-----+-----+
+     *  |          r0           | 0
+     *  +-----+-----+-----+-----+
+     *  | r7  |  r6 |  r5 |/////| 32
+     *  +-----+-----+-----+-----|
+     *  |///////////////////////| 64
+     *  +-----------+-----------+
+     *  |///////////|    r12    | 96
+     *  +-----------+-----------+
+     *             ...
+     *  / = unhandled.
+     */
+
+    /* Try to add an overlapping register handler. */
+    VPCI_ADD_INVALID_REG(vpci_read32, vpci_write32, 4, 4);
+
+    /* Try to add a non-aligned register. */
+    VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 15, 2);
+
+    /* Try to add a register with wrong size. */
+    VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 8, 3);
+
+    /* Try to add a register with missing handlers. */
+    VPCI_ADD_INVALID_REG(NULL, NULL, 8, 2);
+
+    /* Read/write of unset register. */
+    VPCI_READ_CHECK(8, 4, 0xffffffff);
+    VPCI_READ_CHECK(8, 2, 0xffff);
+    VPCI_READ_CHECK(8, 1, 0xff);
+    VPCI_WRITE(10, 2, 0xbeef);
+    VPCI_READ_CHECK(10, 2, 0xffff);
+
+    /* Read of multiple registers */
+    VPCI_WRITE_CHECK(7, 1, 0xbd);
+    VPCI_READ_CHECK(4, 4, 0xbdbabaff);
+
+    /* Partial read of a register. */
+    VPCI_WRITE_CHECK(0, 4, 0x1a1b1c1d);
+    VPCI_READ_CHECK(2, 1, 0x1b);
+    VPCI_READ_CHECK(6, 2, 0xbdba);
+
+    /* Write of multiple registers. */
+    VPCI_WRITE_CHECK(4, 4, 0xaabbccff);
+
+    /* Partial write of a register. */
+    VPCI_WRITE_CHECK(2, 1, 0xfe);
+    VPCI_WRITE_CHECK(6, 2, 0xfebc);
+
+    /*
+     * Test all possible read/write size combinations.
+     *
+     * Place 4 1B registers at 128bits (16B), 2 2B registers at 160bits
+     * (20B) and finally 1 4B register at 192bits (24B).
+     *
+     * Then perform all possible write and read sizes on each of them.
+     *
+     *               ...
+     * 32     24     16      8      0
+     *  +------+------+------+------+
+     *  |r16[3]|r16[2]|r16[1]|r16[0]| 16
+     *  +------+------+------+------+
+     *  |    r20[1]   |    r20[0]   | 20
+     *  +-------------+-------------|
+     *  |            r24            | 24
+     *  +-------------+-------------+
+     *
+     */
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 16, 1, r16[0]);
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 17, 1, r16[1]);
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 18, 1, r16[2]);
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 19, 1, r16[3]);
+
+    VPCI_ADD_REG(vpci_read16, vpci_write16, 20, 2, r20[0]);
+    VPCI_ADD_REG(vpci_read16, vpci_write16, 22, 2, r20[1]);
+
+    VPCI_ADD_REG(vpci_read32, vpci_write32, 24, 4, r24);
+
+    /* Check the initial value is 0. */
+    multiread4_check(16, 0);
+    multiread4_check(20, 0);
+    multiread4_check(24, 0);
+
+    multiwrite4_check(16);
+    multiwrite4_check(20);
+    multiwrite4_check(24);
+
+    /*
+     * Check multiple non-consecutive gaps on the same read/write:
+     *
+     * 32     24     16      8      0
+     *  +------+------+------+------+
+     *  |//////|  r30 |//////|  r28 | 28
+     *  +------+------+------+------+
+     *
+     */
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 28, 1, r28);
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 30, 1, r30);
+    VPCI_WRITE_CHECK(28, 4, 0xffacffdc);
+
+    /* Finally try to remove a couple of registers. */
+    VPCI_REMOVE_REG(28, 1);
+    VPCI_REMOVE_REG(24, 4);
+    VPCI_REMOVE_REG(12, 2);
+
+    VPCI_REMOVE_INVALID_REG(20, 1);
+    VPCI_REMOVE_INVALID_REG(16, 2);
+    VPCI_REMOVE_INVALID_REG(30, 2);
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index b0390180b4..49cae2af71 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -65,6 +65,13 @@ SECTIONS
        __param_start = .;
        *(.data.param)
        __param_end = .;
+
+#if defined(CONFIG_HAS_VPCI) && defined(CONFIG_LATE_HWDOM)
+       . = ALIGN(POINTER_ALIGN);
+       __start_vpci_array = .;
+       *(.data.vpci)
+       __end_vpci_array = .;
+#endif
   } :text
 
 #if defined(BUILD_ID)
@@ -171,6 +178,13 @@ SECTIONS
        *(.init_array)
        *(SORT(.init_array.*))
        __ctors_end = .;
+
+#if defined(CONFIG_HAS_VPCI) && !defined(CONFIG_LATE_HWDOM)
+       . = ALIGN(POINTER_ALIGN);
+       __start_vpci_array = .;
+       *(.data.vpci)
+       __end_vpci_array = .;
+#endif
   } :text
   __init_end_efi = .;
   . = ALIGN(STACK_SIZE);
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index f621e799ed..c405c4bf4f 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -23,6 +23,7 @@ config X86
        select HAS_PCI
        select HAS_PDX
        select HAS_UBSAN
+       select HAS_VPCI
        select NUMA
 
 config ARCH_DEFCONFIG
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index b4e062472e..cafbaf5e94 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -411,10 +411,12 @@ static bool emulation_flags_ok(const struct domain *d, 
uint32_t emflags)
     if ( is_hvm_domain(d) )
     {
         if ( is_hardware_domain(d) &&
-             emflags != (XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC) )
+             emflags != (XEN_X86_EMU_VPCI | XEN_X86_EMU_LAPIC |
+                         XEN_X86_EMU_IOAPIC) )
             return false;
         if ( !is_hardware_domain(d) &&
-             emflags != XEN_X86_EMU_ALL && emflags != XEN_X86_EMU_LAPIC )
+             emflags != (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI) &&
+             emflags != XEN_X86_EMU_LAPIC )
             return false;
     }
     else if ( emflags != 0 && emflags != XEN_X86_EMU_PIT )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 461866420d..a840130c17 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -36,6 +36,7 @@
 #include <xen/rangeset.h>
 #include <xen/monitor.h>
 #include <xen/warning.h>
+#include <xen/vpci.h>
 #include <asm/shadow.h>
 #include <asm/hap.h>
 #include <asm/current.h>
@@ -633,6 +634,7 @@ int hvm_domain_initialise(struct domain *d, unsigned long 
domcr_flags,
         d->arch.hvm_domain.io_bitmap = hvm_io_bitmap;
 
     register_g2m_portio_handler(d);
+    register_vpci_portio_handler(d);
 
     hvm_ioreq_init(d);
 
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 77f4c2ad41..6914bd6834 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -25,6 +25,7 @@
 #include <xen/trace.h>
 #include <xen/event.h>
 #include <xen/hypercall.h>
+#include <xen/vpci.h>
 #include <asm/current.h>
 #include <asm/cpufeature.h>
 #include <asm/processor.h>
@@ -278,6 +279,110 @@ unsigned int hvm_pci_decode_addr(unsigned int cf8, 
unsigned int addr,
     return CF8_ADDR_LO(cf8) | (addr & 3);
 }
 
+/* Do some sanity checks. */
+static bool vpci_access_allowed(unsigned int reg, unsigned int len)
+{
+    /* Check access size. */
+    if ( len != 1 && len != 2 && len != 4 )
+        return false;
+
+    /* Check that access is size aligned. */
+    if ( (reg & (len - 1)) )
+        return false;
+
+    return true;
+}
+
+/* vPCI config space IO ports handlers (0xcf8/0xcfc). */
+static bool vpci_portio_accept(const struct hvm_io_handler *handler,
+                               const ioreq_t *p)
+{
+    return (p->addr == 0xcf8 && p->size == 4) || (p->addr & ~3) == 0xcfc;
+}
+
+static int vpci_portio_read(const struct hvm_io_handler *handler,
+                            uint64_t addr, uint32_t size, uint64_t *data)
+{
+    const struct domain *d = current->domain;
+    unsigned int reg;
+    pci_sbdf_t sbdf;
+    uint32_t cf8;
+
+    *data = ~(uint64_t)0;
+
+    if ( addr == 0xcf8 )
+    {
+        ASSERT(size == 4);
+        *data = d->arch.hvm_domain.pci_cf8;
+        return X86EMUL_OKAY;
+    }
+
+    ASSERT((addr & ~3) == 0xcfc);
+    cf8 = ACCESS_ONCE(d->arch.hvm_domain.pci_cf8);
+    if ( !CF8_ENABLED(cf8) )
+        return X86EMUL_UNHANDLEABLE;
+
+    reg = hvm_pci_decode_addr(cf8, addr, &sbdf);
+
+    if ( !vpci_access_allowed(reg, size) )
+        return X86EMUL_OKAY;
+
+    *data = vpci_read(sbdf, reg, size);
+
+    return X86EMUL_OKAY;
+}
+
+static int vpci_portio_write(const struct hvm_io_handler *handler,
+                             uint64_t addr, uint32_t size, uint64_t data)
+{
+    struct domain *d = current->domain;
+    unsigned int reg;
+    pci_sbdf_t sbdf;
+    uint32_t cf8;
+
+    if ( addr == 0xcf8 )
+    {
+        ASSERT(size == 4);
+        d->arch.hvm_domain.pci_cf8 = data;
+        return X86EMUL_OKAY;
+    }
+
+    ASSERT((addr & ~3) == 0xcfc);
+    cf8 = ACCESS_ONCE(d->arch.hvm_domain.pci_cf8);
+    if ( !CF8_ENABLED(cf8) )
+        return X86EMUL_UNHANDLEABLE;
+
+    reg = hvm_pci_decode_addr(cf8, addr, &sbdf);
+
+    if ( !vpci_access_allowed(reg, size) )
+        return X86EMUL_OKAY;
+
+    vpci_write(sbdf, reg, size, data);
+
+    return X86EMUL_OKAY;
+}
+
+static const struct hvm_io_ops vpci_portio_ops = {
+    .accept = vpci_portio_accept,
+    .read = vpci_portio_read,
+    .write = vpci_portio_write,
+};
+
+void register_vpci_portio_handler(struct domain *d)
+{
+    struct hvm_io_handler *handler;
+
+    if ( !has_vpci(d) )
+        return;
+
+    handler = hvm_next_io_handler(d);
+    if ( !handler )
+        return;
+
+    handler->type = IOREQ_TYPE_PIO;
+    handler->ops = &vpci_portio_ops;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index ac530ece2c..0d4438672f 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1635,7 +1635,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         domcr_flags |= DOMCRF_hvm |
                        ((hvm_funcs.hap_supported && !opt_dom0_shadow) ?
                          DOMCRF_hap : 0);
-        config.emulation_flags = XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC;
+        config.emulation_flags = XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC|
+                                 XEN_X86_EMU_VPCI;
     }
 
     /* Create initial domain 0. */
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index e9f2ecd9fb..7bd6fb51c3 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -135,6 +135,13 @@ SECTIONS
        __param_start = .;
        *(.data.param)
        __param_end = .;
+
+#if defined(CONFIG_HAS_VPCI) && defined(CONFIG_LATE_HWDOM)
+       . = ALIGN(POINTER_ALIGN);
+       __start_vpci_array = .;
+       *(.data.vpci)
+       __end_vpci_array = .;
+#endif
   } :text
 
 #if defined(CONFIG_PVH_GUEST) && !defined(EFI)
@@ -235,6 +242,13 @@ SECTIONS
        *(.init_array)
        *(SORT(.init_array.*))
        __ctors_end = .;
+
+#if defined(CONFIG_HAS_VPCI) && !defined(CONFIG_LATE_HWDOM)
+       . = ALIGN(POINTER_ALIGN);
+       __start_vpci_array = .;
+       *(.data.vpci)
+       __end_vpci_array = .;
+#endif
   } :text
 
   . = ALIGN(SECTION_ALIGN);
diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
index bc3a54f0ea..db94393f47 100644
--- a/xen/drivers/Kconfig
+++ b/xen/drivers/Kconfig
@@ -12,4 +12,7 @@ source "drivers/pci/Kconfig"
 
 source "drivers/video/Kconfig"
 
+config HAS_VPCI
+       bool
+
 endmenu
diff --git a/xen/drivers/Makefile b/xen/drivers/Makefile
index 19391802a8..30bab3cfdb 100644
--- a/xen/drivers/Makefile
+++ b/xen/drivers/Makefile
@@ -1,6 +1,7 @@
 subdir-y += char
 subdir-$(CONFIG_HAS_CPUFREQ) += cpufreq
 subdir-$(CONFIG_HAS_PCI) += pci
+subdir-$(CONFIG_HAS_VPCI) += vpci
 subdir-$(CONFIG_HAS_PASSTHROUGH) += passthrough
 subdir-$(CONFIG_ACPI) += acpi
 subdir-$(CONFIG_VIDEO) += video
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 2b976ade62..e65c7faa6f 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -31,6 +31,7 @@
 #include <xen/radix-tree.h>
 #include <xen/softirq.h>
 #include <xen/tasklet.h>
+#include <xen/vpci.h>
 #include <xsm/xsm.h>
 #include <asm/msi.h>
 #include "ats.h"
@@ -1050,10 +1051,10 @@ static void __hwdom_init setup_one_hwdom_device(const 
struct setup_hwdom *ctxt,
                                                 struct pci_dev *pdev)
 {
     u8 devfn = pdev->devfn;
+    int err;
 
     do {
-        int err = ctxt->handler(devfn, pdev);
-
+        err = ctxt->handler(devfn, pdev);
         if ( err )
         {
             printk(XENLOG_ERR "setup %04x:%02x:%02x.%u for d%d failed (%d)\n",
@@ -1065,6 +1066,11 @@ static void __hwdom_init setup_one_hwdom_device(const 
struct setup_hwdom *ctxt,
         devfn += pdev->phantom_stride;
     } while ( devfn != pdev->devfn &&
               PCI_SLOT(devfn) == PCI_SLOT(pdev->devfn) );
+
+    err = vpci_add_handlers(pdev);
+    if ( err )
+        printk(XENLOG_ERR "setup of vPCI for d%d failed: %d\n",
+               ctxt->d->domain_id, err);
 }
 
 static int __hwdom_init _setup_hwdom_pci_devices(struct pci_seg *pseg, void 
*arg)
diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
new file mode 100644
index 0000000000..840a906470
--- /dev/null
+++ b/xen/drivers/vpci/Makefile
@@ -0,0 +1 @@
+obj-y += vpci.o
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
new file mode 100644
index 0000000000..4740d02edf
--- /dev/null
+++ b/xen/drivers/vpci/vpci.c
@@ -0,0 +1,459 @@
+/*
+ * Generic functionality for handling accesses to the PCI configuration space
+ * from guests.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/sched.h>
+#include <xen/vpci.h>
+
+extern vpci_register_init_t *const __start_vpci_array[];
+extern vpci_register_init_t *const __end_vpci_array[];
+#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
+
+/* Internal struct to store the emulated PCI registers. */
+struct vpci_register {
+    vpci_read_t *read;
+    vpci_write_t *write;
+    unsigned int size;
+    unsigned int offset;
+    void *private;
+    struct list_head node;
+};
+
+int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
+{
+    unsigned int i;
+    int rc = 0;
+
+    if ( !has_vpci(pdev->domain) )
+        return 0;
+
+    pdev->vpci = xzalloc(struct vpci);
+    if ( !pdev->vpci )
+        return -ENOMEM;
+
+    INIT_LIST_HEAD(&pdev->vpci->handlers);
+    spin_lock_init(&pdev->vpci->lock);
+
+    for ( i = 0; i < NUM_VPCI_INIT; i++ )
+    {
+        rc = __start_vpci_array[i](pdev);
+        if ( rc )
+            break;
+    }
+
+    if ( rc )
+    {
+        while ( !list_empty(&pdev->vpci->handlers) )
+        {
+            struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
+                                                       struct vpci_register,
+                                                       node);
+
+            list_del(&r->node);
+            xfree(r);
+        }
+        xfree(pdev->vpci);
+        pdev->vpci = NULL;
+    }
+
+    return rc;
+}
+
+static int vpci_register_cmp(const struct vpci_register *r1,
+                             const struct vpci_register *r2)
+{
+    /* Return 0 if registers overlap. */
+    if ( r1->offset < r2->offset + r2->size &&
+         r2->offset < r1->offset + r1->size )
+        return 0;
+    if ( r1->offset < r2->offset )
+        return -1;
+    if ( r1->offset > r2->offset )
+        return 1;
+
+    ASSERT_UNREACHABLE();
+    return 0;
+}
+
+/* Dummy hooks, writes are ignored, reads return 1's */
+static uint32_t vpci_ignored_read(const struct pci_dev *pdev, unsigned int reg,
+                                  void *data)
+{
+    return ~(uint32_t)0;
+}
+
+static void vpci_ignored_write(const struct pci_dev *pdev, unsigned int reg,
+                               uint32_t val, void *data)
+{
+}
+
+int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
+                      vpci_write_t *write_handler, unsigned int offset,
+                      unsigned int size, void *data)
+{
+    struct list_head *prev;
+    struct vpci_register *r;
+
+    /* Some sanity checks. */
+    if ( (size != 1 && size != 2 && size != 4) ||
+         offset >= PCI_CFG_SPACE_EXP_SIZE || (offset & (size - 1)) ||
+         (!read_handler && !write_handler) )
+        return -EINVAL;
+
+    r = xmalloc(struct vpci_register);
+    if ( !r )
+        return -ENOMEM;
+
+    r->read = read_handler ?: vpci_ignored_read;
+    r->write = write_handler ?: vpci_ignored_write;
+    r->size = size;
+    r->offset = offset;
+    r->private = data;
+
+    spin_lock(&vpci->lock);
+
+    /* The list of handlers must be kept sorted at all times. */
+    list_for_each ( prev, &vpci->handlers )
+    {
+        const struct vpci_register *this =
+            list_entry(prev, const struct vpci_register, node);
+        int cmp = vpci_register_cmp(r, this);
+
+        if ( cmp < 0 )
+            break;
+        if ( cmp == 0 )
+        {
+            spin_unlock(&vpci->lock);
+            xfree(r);
+            return -EEXIST;
+        }
+    }
+
+    list_add_tail(&r->node, prev);
+    spin_unlock(&vpci->lock);
+
+    return 0;
+}
+
+int vpci_remove_register(struct vpci *vpci, unsigned int offset,
+                         unsigned int size)
+{
+    const struct vpci_register r = { .offset = offset, .size = size };
+    struct vpci_register *rm;
+
+    spin_lock(&vpci->lock);
+    list_for_each_entry ( rm, &vpci->handlers, node )
+    {
+        int cmp = vpci_register_cmp(&r, rm);
+
+        /*
+         * NB: do not use a switch so that we can use break to
+         * get out of the list loop earlier if required.
+         */
+        if ( !cmp && rm->offset == offset && rm->size == size )
+        {
+            list_del(&rm->node);
+            spin_unlock(&vpci->lock);
+            xfree(rm);
+            return 0;
+        }
+        if ( cmp <= 0 )
+            break;
+    }
+    spin_unlock(&vpci->lock);
+
+    return -ENOENT;
+}
+
+/* Wrappers for performing reads/writes to the underlying hardware. */
+static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned int reg,
+                             unsigned int size)
+{
+    uint32_t data;
+
+    switch ( size )
+    {
+    case 4:
+        data = pci_conf_read32(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg);
+        break;
+
+    case 3:
+        /*
+         * This is possible because a 4byte read can have 1byte trapped and
+         * the rest passed-through.
+         */
+        if ( reg & 1 )
+        {
+            data = pci_conf_read8(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func,
+                                  reg);
+            data |= pci_conf_read16(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func,
+                                    reg + 1) << 8;
+        }
+        else
+        {
+            data = pci_conf_read16(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func,
+                                   reg);
+            data |= pci_conf_read8(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func,
+                                   reg + 2) << 16;
+        }
+        break;
+
+    case 2:
+        data = pci_conf_read16(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg);
+        break;
+
+    case 1:
+        data = pci_conf_read8(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg);
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+        data = ~(uint32_t)0;
+        break;
+    }
+
+    return data;
+}
+
+static void vpci_write_hw(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
+                          uint32_t data)
+{
+    switch ( size )
+    {
+    case 4:
+        pci_conf_write32(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg, data);
+        break;
+
+    case 3:
+        /*
+         * This is possible because a 4byte write can have 1byte trapped and
+         * the rest passed-through.
+         */
+        if ( reg & 1 )
+        {
+            pci_conf_write8(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg,
+                            data);
+            pci_conf_write16(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg + 1,
+                             data >> 8);
+        }
+        else
+        {
+            pci_conf_write16(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg,
+                             data);
+            pci_conf_write8(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg + 2,
+                            data >> 16);
+        }
+        break;
+
+    case 2:
+        pci_conf_write16(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg, data);
+        break;
+
+    case 1:
+        pci_conf_write8(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg, data);
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+        break;
+    }
+}
+
+/*
+ * Merge new data into a partial result.
+ *
+ * Copy the value found in 'new' from [0, size) left shifted by
+ * 'offset' into 'data'. Note that both 'size' and 'offset' are
+ * in byte units.
+ */
+static uint32_t merge_result(uint32_t data, uint32_t new, unsigned int size,
+                             unsigned int offset)
+{
+    uint32_t mask = 0xffffffff >> (32 - 8 * size);
+
+    return (data & ~(mask << (offset * 8))) | ((new & mask) << (offset * 8));
+}
+
+uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
+{
+    const struct domain *d = current->domain;
+    const struct pci_dev *pdev;
+    const struct vpci_register *r;
+    unsigned int data_offset = 0;
+    uint32_t data = ~(uint32_t)0;
+
+    /* Find the PCI dev matching the address. */
+    pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.extfunc);
+    if ( !pdev )
+        return vpci_read_hw(sbdf, reg, size);
+
+    spin_lock(&pdev->vpci->lock);
+
+    /* Read from the hardware or the emulated register handlers. */
+    list_for_each_entry ( r, &pdev->vpci->handlers, node )
+    {
+        const struct vpci_register emu = {
+            .offset = reg + data_offset,
+            .size = size - data_offset
+        };
+        int cmp = vpci_register_cmp(&emu, r);
+        uint32_t val;
+        unsigned int read_size;
+
+        if ( cmp < 0 )
+            break;
+        if ( cmp > 0 )
+            continue;
+
+        if ( emu.offset < r->offset )
+        {
+            /* Heading gap, read partial content from hardware. */
+            read_size = r->offset - emu.offset;
+            val = vpci_read_hw(sbdf, emu.offset, read_size);
+            data = merge_result(data, val, read_size, data_offset);
+            data_offset += read_size;
+        }
+
+        val = r->read(pdev, r->offset, r->private);
+
+        /* Check if the read is in the middle of a register. */
+        if ( r->offset < emu.offset )
+            val >>= (emu.offset - r->offset) * 8;
+
+        /* Find the intersection size between the two sets. */
+        read_size = min(emu.offset + emu.size, r->offset + r->size) -
+                    max(emu.offset, r->offset);
+        /* Merge the emulated data into the native read value. */
+        data = merge_result(data, val, read_size, data_offset);
+        data_offset += read_size;
+        if ( data_offset == size )
+            break;
+        ASSERT(data_offset < size);
+    }
+
+    if ( data_offset < size )
+    {
+        /* Tailing gap, read the remaining. */
+        uint32_t tmp_data = vpci_read_hw(sbdf, reg + data_offset,
+                                         size - data_offset);
+
+        data = merge_result(data, tmp_data, size - data_offset, data_offset);
+    }
+    spin_unlock(&pdev->vpci->lock);
+
+    return data & (0xffffffff >> (32 - 8 * size));
+}
+
+/*
+ * Perform a maybe partial write to a register.
+ *
+ * Note that this will only work for simple registers, if Xen needs to
+ * trap accesses to rw1c registers (like the status PCI header register)
+ * the logic in vpci_write will have to be expanded in order to correctly
+ * deal with them.
+ */
+static void vpci_write_helper(const struct pci_dev *pdev,
+                              const struct vpci_register *r, unsigned int size,
+                              unsigned int offset, uint32_t data)
+{
+    ASSERT(size <= r->size);
+
+    if ( size != r->size )
+    {
+        uint32_t val;
+
+        val = r->read(pdev, r->offset, r->private);
+        data = merge_result(val, data, size, offset);
+    }
+
+    r->write(pdev, r->offset, data & (0xffffffff >> (32 - 8 * r->size)),
+             r->private);
+}
+
+void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
+                uint32_t data)
+{
+    const struct domain *d = current->domain;
+    const struct pci_dev *pdev;
+    const struct vpci_register *r;
+    unsigned int data_offset = 0;
+
+    /*
+     * Find the PCI dev matching the address.
+     * Passthrough everything that's not trapped.
+     */
+    pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.extfunc);
+    if ( !pdev )
+    {
+        vpci_write_hw(sbdf, reg, size, data);
+        return;
+    }
+
+    spin_lock(&pdev->vpci->lock);
+
+    /* Write the value to the hardware or emulated registers. */
+    list_for_each_entry ( r, &pdev->vpci->handlers, node )
+    {
+        const struct vpci_register emu = {
+            .offset = reg + data_offset,
+            .size = size - data_offset
+        };
+        int cmp = vpci_register_cmp(&emu, r);
+        unsigned int write_size;
+
+        if ( cmp < 0 )
+            break;
+        if ( cmp > 0 )
+            continue;
+
+        if ( emu.offset < r->offset )
+        {
+            /* Heading gap, write partial content to hardware. */
+            vpci_write_hw(sbdf, emu.offset, r->offset - emu.offset,
+                          data >> (data_offset * 8));
+            data_offset += r->offset - emu.offset;
+        }
+
+        /* Find the intersection size between the two sets. */
+        write_size = min(emu.offset + emu.size, r->offset + r->size) -
+                     max(emu.offset, r->offset);
+        vpci_write_helper(pdev, r, write_size, reg + data_offset - r->offset,
+                          data >> (data_offset * 8));
+        data_offset += write_size;
+        if ( data_offset == size )
+            break;
+        ASSERT(data_offset < size);
+    }
+
+    if ( data_offset < size )
+        /* Tailing gap, write the remaining. */
+        vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
+                      data >> (data_offset * 8));
+
+    spin_unlock(&pdev->vpci->lock);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 47aadc2600..a12ae47f1b 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -434,6 +434,7 @@ struct arch_domain
 #define has_vpit(d)        (!!((d)->arch.emulation_flags & XEN_X86_EMU_PIT))
 #define has_pirq(d)        (!!((d)->arch.emulation_flags & \
                             XEN_X86_EMU_USE_PIRQ))
+#define has_vpci(d)        (!!((d)->arch.emulation_flags & XEN_X86_EMU_VPCI))
 
 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
 
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 707665fbba..ff0bea5d53 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -160,6 +160,9 @@ unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned 
int addr,
  */
 void register_g2m_portio_handler(struct domain *d);
 
+/* HVM port IO handler for vPCI accesses. */
+void register_vpci_portio_handler(struct domain *d);
+
 #endif /* __ASM_X86_HVM_IO_H__ */
 
 
diff --git a/xen/include/public/arch-x86/xen.h 
b/xen/include/public/arch-x86/xen.h
index 3b0b1d6073..69ee4bc40d 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -294,12 +294,15 @@ struct xen_arch_domainconfig {
 #define XEN_X86_EMU_PIT             (1U<<_XEN_X86_EMU_PIT)
 #define _XEN_X86_EMU_USE_PIRQ       9
 #define XEN_X86_EMU_USE_PIRQ        (1U<<_XEN_X86_EMU_USE_PIRQ)
+#define _XEN_X86_EMU_VPCI           10
+#define XEN_X86_EMU_VPCI            (1U<<_XEN_X86_EMU_VPCI)
 
 #define XEN_X86_EMU_ALL             (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |  \
                                      XEN_X86_EMU_PM | XEN_X86_EMU_RTC |      \
                                      XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |  \
                                      XEN_X86_EMU_VGA | XEN_X86_EMU_IOMMU |   \
-                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ)
+                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ |\
+                                     XEN_X86_EMU_VPCI)
     uint32_t emulation_flags;
 };
 
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index dd5ec43a70..b7a6abfc53 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -112,6 +112,9 @@ struct pci_dev {
 #define PT_FAULT_THRESHOLD 10
     } fault;
     u64 vf_rlen[6];
+
+    /* Data for vPCI. */
+    struct vpci *vpci;
 };
 
 #define for_each_pdev(domain, pdev) \
diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
index ecd6124d91..cc4ee3b83e 100644
--- a/xen/include/xen/pci_regs.h
+++ b/xen/include/xen/pci_regs.h
@@ -22,6 +22,14 @@
 #ifndef LINUX_PCI_REGS_H
 #define LINUX_PCI_REGS_H
 
+/*
+ * Conventional PCI and PCI-X Mode 1 devices have 256 bytes of
+ * configuration space.  PCI-X Mode 2 and PCIe devices have 4096 bytes of
+ * configuration space.
+ */
+#define PCI_CFG_SPACE_SIZE     256
+#define PCI_CFG_SPACE_EXP_SIZE 4096
+
 /*
  * Under PCI, each device has 256 bytes of configuration address space,
  * of which the first 64 bytes are standardized as follows:
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
new file mode 100644
index 0000000000..9f2864fb0c
--- /dev/null
+++ b/xen/include/xen/vpci.h
@@ -0,0 +1,53 @@
+#ifndef _XEN_VPCI_H_
+#define _XEN_VPCI_H_
+
+#include <xen/pci.h>
+#include <xen/types.h>
+#include <xen/list.h>
+
+typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
+                             void *data);
+
+typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
+                          uint32_t val, void *data);
+
+typedef int vpci_register_init_t(struct pci_dev *dev);
+
+#define REGISTER_VPCI_INIT(x)                   \
+  static vpci_register_init_t *const x##_entry  \
+               __used_section(".data.vpci") = x
+
+/* Add vPCI handlers to device. */
+int __must_check vpci_add_handlers(struct pci_dev *dev);
+
+/* Add/remove a register handler. */
+int __must_check vpci_add_register(struct vpci *vpci,
+                                   vpci_read_t *read_handler,
+                                   vpci_write_t *write_handler,
+                                   unsigned int offset, unsigned int size,
+                                   void *data);
+int __must_check vpci_remove_register(struct vpci *vpci, unsigned int offset,
+                                      unsigned int size);
+
+/* Generic read/write handlers for the PCI config space. */
+uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size);
+void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
+                uint32_t data);
+
+struct vpci {
+    /* List of vPCI handlers for a device. */
+    struct list_head handlers;
+    spinlock_t lock;
+};
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.16.2


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.