[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v5 02/11] vpci: introduce basic handlers to trap accesses to the PCI config space



This functionality is going to reside in vpci.c (and the corresponding
vpci.h header), and should be arch-agnostic. The handlers introduced
in this patch setup the basic functionality required in order to trap
accesses to the PCI config space, and allow decoding the address and
finding the corresponding handler that should handle the access
(although no handlers are implemented).

Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are
setup inside of a x86 HVM file, since that's not shared with other
arches.

A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen
whether a domain should use the newly introduced vPCI handlers, this
is only enabled for PVH Dom0 at the moment.

A very simple user-space test is also provided, so that the basic
functionality of the vPCI traps can be asserted. This has been proven
quite helpful during development, since the logic to handle partial
accesses or accesses that expand across multiple registers is not
trivial.

The handlers for the registers are added to a linked list that's keep
sorted at all times. Both the read and write handlers support accesses
that expand across multiple emulated registers and contain gaps not
emulated.

Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
---
Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
Cc: Wei Liu <wei.liu2@xxxxxxxxxx>
Cc: Jan Beulich <jbeulich@xxxxxxxx>
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Cc: Paul Durrant <paul.durrant@xxxxxxxxxx>
---
Changes since v4:
* User-space test harness:
 - Do not redirect the output of the test.
 - Add main.c and emul.h as dependencies of the Makefile target.
 - Use the same rule to modify the vpci and list headers.
 - Remove underscores from local macro variables.
 - Add _check suffix to the test harness multiread function.
 - Change the value written by every different size in the multiwrite
   test.
 - Use { } to initialize the r16 and r20 arrays (instead of { 0 }).
 - Perform some of the read checks with the local variable directly.
 - Expand some comments.
 - Implement a dummy rwlock.
* Hypervisor code:
 - Guard the linker script changes with CONFIG_HAS_PCI.
 - Rename vpci_access_check to vpci_access_allowed and make it return
   bool.
 - Make hvm_pci_decode_addr return the register as return value.
 - Use ~3 instead of 0xfffc to remove the register offset when
   checking accesses to IO ports.
 - s/head/prev in vpci_add_register.
 - Add parentheses around & in vpci_add_register.
 - Fix register removal.
 - Change the BUGs in vpci_{read/write}_hw helpers to
   ASSERT_UNREACHABLE.
 - Make merge_result static and change the computation of the mask to
   avoid using a uint64_t.
 - Modify vpci_read to only read from hardware the not-emulated gaps.
 - Remove the vpci_val union and use a uint32_t instead.
 - Change handler read type to return a uint32_t instead of modifying
   a variable passed by reference.
 - Constify the data opaque parameter of read handlers.
 - Change the size parameter of the vpci_{read/write} functions to
   unsigned int.
 - Place the array of initialization handlers in init.rodata or
   .rodata depending on whether late-hwdom is enabled.
 - Remove the pci_devs lock, assume the Dom0 is well behaved and won't
   remove the device while trying to access it.
 - Change the recursive spinlock into a rw lock for performance
   reasons.

Changes since v3:
* User-space test harness:
 - Fix spaces in container_of macro.
 - Implement a dummy locking functions.
 - Remove 'current' macro make current a pointer to the statically
   allocated vpcu.
 - Remove unneeded parentheses in the pci_conf_readX macros.
 - Fix the name of the write test macro.
 - Remove the dummy EXPORT_SYMBOL macro (this was needed by the RB
   code only).
 - Import the max macro.
 - Test all possible read/write size combinations with all possible
   emulated register sizes.
 - Introduce a test for register removal.
* Hypervisor code:
 - Use a sorted list in order to store the config space handlers.
 - Remove some unneeded 'else' branches.
 - Make the IO port handlers always return X86EMUL_OKAY, and set the
   data to all 1's in case of read failure (write are simply ignored).
 - In hvm_select_ioreq_server reuse local variables when calling
   XEN_DMOP_PCI_SBDF.
 - Store the pointers to the initialization functions in the .rodata
   section.
 - Do not ignore the return value of xen_vpci_add_handlers in
   setup_one_hwdom_device.
 - Remove the vpci_init macro.
 - Do not hide the pointers inside of the vpci_{read/write}_t
   typedefs.
 - Rename priv_data to private in vpci_register.
 - Simplify checking for register overlap in vpci_register_cmp.
 - Check that the offset and the length match before removing a
   register in xen_vpci_remove_register.
 - Make vpci_read_hw return a value rather than storing it in a
   pointer passed by parameter.
 - Handler dispatcher functions vpci_{read/write} no longer return an
   error code, errors on reads/writes should be treated like hardware
   (writes ignored, reads return all 1's or garbage).
 - Make sure pcidevs is locked before calling pci_get_pdev_by_domain.
 - Use a recursive spinlock for the vpci lock, so that spin_is_locked
   checks that the current CPU is holding the lock.
 - Make the code less error-chatty by removing some of the printk's.
 - Pass the slot and the function as separate parameters to the
   handler dispatchers (instead of passing devfn).
 - Allow handlers to be registered with either a read or write
   function only, the missing handler will be replaced by a dummy
   handler (writes ignored, reads return 1's).
 - Introduce PCI_CFG_SPACE_* defines from Linux.
 - Simplify the handler dispatchers by removing the recursion, now the
   dispatchers iterate over the list of sorted handlers and call them
   in order.
 - Remove the GENMASK_BYTES, SHIFT_RIGHT_BYTES and ADD_RESULT macros,
   and instead provide a merge_result function in order to merge a
   register output into a partial result.
 - Rename the fields of the vpci_val union to u8/u16/u32.
 - Remove the return values from the read/write handlers, errors
   should be handled internally and signaled as would be done on
   native hardware.
 - Remove the usage of the GENMASK macro.

Changes since v2:
 - Generalize the PCI address decoding and use it for IOREQ code also.

Changes since v1:
 - Allow access to cross a word-boundary.
 - Add locking.
 - Add cleanup to xen_vpci_add_handlers in case of failure.
---
 .gitignore                        |   3 +
 tools/libxl/libxl_x86.c           |   2 +-
 tools/tests/Makefile              |   1 +
 tools/tests/vpci/Makefile         |  37 ++++
 tools/tests/vpci/emul.h           | 128 +++++++++++
 tools/tests/vpci/main.c           | 314 +++++++++++++++++++++++++++
 xen/arch/arm/xen.lds.S            |  10 +
 xen/arch/x86/domain.c             |  18 +-
 xen/arch/x86/hvm/hvm.c            |   2 +
 xen/arch/x86/hvm/io.c             | 118 +++++++++-
 xen/arch/x86/setup.c              |   3 +-
 xen/arch/x86/xen.lds.S            |  10 +
 xen/drivers/Makefile              |   2 +-
 xen/drivers/passthrough/pci.c     |   9 +-
 xen/drivers/vpci/Makefile         |   1 +
 xen/drivers/vpci/vpci.c           | 443 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/domain.h      |   1 +
 xen/include/asm-x86/hvm/domain.h  |   3 +
 xen/include/asm-x86/hvm/io.h      |   3 +
 xen/include/public/arch-x86/xen.h |   5 +-
 xen/include/xen/pci.h             |   3 +
 xen/include/xen/pci_regs.h        |   8 +
 xen/include/xen/vpci.h            |  80 +++++++
 23 files changed, 1194 insertions(+), 10 deletions(-)
 create mode 100644 tools/tests/vpci/Makefile
 create mode 100644 tools/tests/vpci/emul.h
 create mode 100644 tools/tests/vpci/main.c
 create mode 100644 xen/drivers/vpci/Makefile
 create mode 100644 xen/drivers/vpci/vpci.c
 create mode 100644 xen/include/xen/vpci.h

diff --git a/.gitignore b/.gitignore
index 594ffd9a7f..7f24ce72b1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -237,6 +237,9 @@ tools/tests/regression/build/*
 tools/tests/regression/downloads/*
 tools/tests/mem-sharing/memshrtool
 tools/tests/mce-test/tools/xen-mceinj
+tools/tests/vpci/list.h
+tools/tests/vpci/vpci.[hc]
+tools/tests/vpci/test_vpci
 tools/xcutils/lsevtchn
 tools/xcutils/readnotes
 tools/xenbackendd/_paths.h
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 455f6f0bed..dd7fc78a99 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -11,7 +11,7 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
     if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM) {
         if (d_config->b_info.device_model_version !=
             LIBXL_DEVICE_MODEL_VERSION_NONE) {
-            xc_config->emulation_flags = XEN_X86_EMU_ALL;
+            xc_config->emulation_flags = (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI);
         } else if (libxl_defbool_val(d_config->b_info.u.hvm.apic)) {
             /*
              * HVM guests without device model may want
diff --git a/tools/tests/Makefile b/tools/tests/Makefile
index 7162945121..f6942a93fb 100644
--- a/tools/tests/Makefile
+++ b/tools/tests/Makefile
@@ -13,6 +13,7 @@ endif
 SUBDIRS-$(CONFIG_X86) += x86_emulator
 SUBDIRS-y += xen-access
 SUBDIRS-y += xenstore
+SUBDIRS-$(CONFIG_HAS_PCI) += vpci
 
 .PHONY: all clean install distclean uninstall
 all clean distclean: %: subdirs-%
diff --git a/tools/tests/vpci/Makefile b/tools/tests/vpci/Makefile
new file mode 100644
index 0000000000..e45fcb5cd9
--- /dev/null
+++ b/tools/tests/vpci/Makefile
@@ -0,0 +1,37 @@
+XEN_ROOT=$(CURDIR)/../../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+TARGET := test_vpci
+
+.PHONY: all
+all: $(TARGET)
+
+.PHONY: run
+run: $(TARGET)
+       ./$(TARGET)
+
+$(TARGET): vpci.c vpci.h list.h main.c emul.h
+       $(HOSTCC) -g -o $@ vpci.c main.c
+
+.PHONY: clean
+clean:
+       rm -rf $(TARGET) *.o *~ vpci.h vpci.c list.h
+
+.PHONY: distclean
+distclean: clean
+
+.PHONY: install
+install:
+
+vpci.c: $(XEN_ROOT)/xen/drivers/vpci/vpci.c
+       # Trick the compiler so it doesn't complain about missing symbols
+       sed -e '/#include/d' \
+           -e '1s;^;#include "emul.h"\
+                    vpci_register_init_t *const __start_vpci_array[1]\;\
+                    vpci_register_init_t *const __end_vpci_array[1]\;\
+                    ;' <$< >$@
+
+list.h: $(XEN_ROOT)/xen/include/xen/list.h
+vpci.h: $(XEN_ROOT)/xen/include/xen/vpci.h
+list.h vpci.h:
+       sed -e '/#include/d' <$< >$@
diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
new file mode 100644
index 0000000000..69f08ba1e7
--- /dev/null
+++ b/tools/tests/vpci/emul.h
@@ -0,0 +1,128 @@
+/*
+ * Unit tests for the generic vPCI handler code.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _TEST_VPCI_
+#define _TEST_VPCI_
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <assert.h>
+
+#define container_of(ptr, type, member) ({                      \
+        typeof(((type *)0)->member) *mptr = (ptr);              \
+                                                                \
+        (type *)((char *)mptr - offsetof(type, member));        \
+})
+
+#define smp_wmb()
+#define prefetch(x) __builtin_prefetch(x)
+#define ASSERT(x) assert(x)
+#define __must_check __attribute__((__warn_unused_result__))
+
+#include "list.h"
+
+struct pci_dev {
+    struct domain *domain;
+    struct vpci *vpci;
+};
+
+struct domain {
+    enum {
+        UNLOCKED,
+        RLOCKED,
+        WLOCKED,
+    } lock;
+};
+
+struct vcpu
+{
+    struct domain *domain;
+};
+
+extern struct vcpu *current;
+extern struct pci_dev test_pdev;
+
+#include "vpci.h"
+
+#define __hwdom_init
+
+#define has_vpci(d) true
+
+/* Define our own locks. */
+#undef vpci_rlock
+#undef vpci_wlock
+#undef vpci_runlock
+#undef vpci_wunlock
+#undef vpci_rlocked
+#undef vpci_wlocked
+#define vpci_rlock(d) ((d)->lock = RLOCKED)
+#define vpci_wlock(d) ((d)->lock = WLOCKED)
+#define vpci_runlock(d) ((d)->lock = UNLOCKED)
+#define vpci_wunlock(d) ((d)->lock = UNLOCKED)
+#define vpci_rlocked(d) ((d)->lock == RLOCKED)
+#define vpci_wlocked(d) ((d)->lock == WLOCKED)
+
+#define xzalloc(type) ((type *)calloc(1, sizeof(type)))
+#define xmalloc(type) ((type *)malloc(sizeof(type)))
+#define xfree(p) free(p)
+
+#define pci_get_pdev_by_domain(...) &test_pdev
+
+/* Dummy native helpers. Writes are ignored, reads return 1's. */
+#define pci_conf_read8(...)     0xff
+#define pci_conf_read16(...)    0xffff
+#define pci_conf_read32(...)    0xffffffff
+#define pci_conf_write8(...)
+#define pci_conf_write16(...)
+#define pci_conf_write32(...)
+
+#define PCI_CFG_SPACE_EXP_SIZE 4096
+
+#define BUG() assert(0)
+#define ASSERT_UNREACHABLE() assert(0)
+
+#define min(x, y) ({                    \
+        const typeof(x) tx = (x);       \
+        const typeof(y) ty = (y);       \
+                                        \
+        (void) (&tx == &ty);            \
+        tx < ty ? tx : ty;              \
+})
+
+#define max(x, y) ({                    \
+        const typeof(x) tx = (x);       \
+        const typeof(y) ty = (y);       \
+                                        \
+        (void) (&tx == &ty);            \
+        tx > ty ? tx : ty;              \
+})
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
new file mode 100644
index 0000000000..33736937bf
--- /dev/null
+++ b/tools/tests/vpci/main.c
@@ -0,0 +1,314 @@
+/*
+ * Unit tests for the generic vPCI handler code.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "emul.h"
+
+/* Single vcpu (current), and single domain with a single PCI device. */
+static struct vpci vpci;
+
+static struct domain d = {
+    .lock = false,
+};
+
+struct pci_dev test_pdev = {
+    .domain = &d,
+    .vpci = &vpci,
+};
+
+static struct vcpu v = {
+    .domain = &d
+};
+
+struct vcpu *current = &v;
+
+/* Dummy hooks, write stores data, read fetches it. */
+static uint32_t vpci_read8(struct pci_dev *pdev, unsigned int reg,
+                           const void *data)
+{
+    return *(const uint8_t *)data;
+}
+
+static void vpci_write8(struct pci_dev *pdev, unsigned int reg,
+                        uint32_t val, void *data)
+{
+    *(uint8_t *)data = val;
+}
+
+static uint32_t vpci_read16(struct pci_dev *pdev, unsigned int reg,
+                            const void *data)
+{
+    return *(const uint16_t *)data;
+}
+
+static void vpci_write16(struct pci_dev *pdev, unsigned int reg,
+                         uint32_t val, void *data)
+{
+    *(uint16_t *)data = val;
+}
+
+static uint32_t vpci_read32(struct pci_dev *pdev, unsigned int reg,
+                            const void *data)
+{
+    return *(const uint32_t *)data;
+}
+
+static void vpci_write32(struct pci_dev *pdev, unsigned int reg,
+                         uint32_t val, void *data)
+{
+    *(uint32_t *)data = val;
+}
+
+#define VPCI_READ(reg, size, data) ({                   \
+    vpci_rlock(&d);                                     \
+    data = vpci_read(0, 0, 0, 0, reg, size);            \
+    vpci_runlock(&d);                                   \
+})
+
+#define VPCI_READ_CHECK(reg, size, expected) ({         \
+    uint32_t rd;                                        \
+                                                        \
+    VPCI_READ(reg, size, rd);                           \
+    assert(rd == (expected));                           \
+})
+
+#define VPCI_WRITE(reg, size, data) ({                  \
+    vpci_wlock(&d);                                     \
+    vpci_write(0, 0, 0, 0, reg, size, data);            \
+    vpci_wunlock(&d);                                   \
+})
+
+#define VPCI_WRITE_CHECK(reg, size, data) ({            \
+    VPCI_WRITE(reg, size, data);                        \
+    VPCI_READ_CHECK(reg, size, data);                   \
+})
+
+#define VPCI_ADD_REG(fread, fwrite, off, size, store)                       \
+    assert(!vpci_add_register(&test_pdev, fread, fwrite, off, size, &store))
+
+#define VPCI_ADD_INVALID_REG(fread, fwrite, off, size)                      \
+    assert(vpci_add_register(&test_pdev, fread, fwrite, off, size, NULL))
+
+#define VPCI_REMOVE_REG(off, size)                                          \
+    assert(!vpci_remove_register(&test_pdev, off, size))
+
+#define VPCI_REMOVE_INVALID_REG(off, size)                                  \
+    assert(vpci_remove_register(&test_pdev, off, size))
+
+/* Read a 32b register using all possible sizes. */
+void multiread4_check(unsigned int reg, uint32_t val)
+{
+    unsigned int i;
+
+    /* Read using bytes. */
+    for ( i = 0; i < 4; i++ )
+        VPCI_READ_CHECK(reg + i, 1, (val >> (i * 8)) & UINT8_MAX);
+
+    /* Read using 2bytes. */
+    for ( i = 0; i < 2; i++ )
+        VPCI_READ_CHECK(reg + i * 2, 2, (val >> (i * 2 * 8)) & UINT16_MAX);
+
+    VPCI_READ_CHECK(reg, 4, val);
+}
+
+void multiwrite4_check(unsigned int reg)
+{
+    unsigned int i;
+    uint32_t val = 0xa2f51732;
+
+    /* Write using bytes. */
+    for ( i = 0; i < 4; i++ )
+        VPCI_WRITE_CHECK(reg + i, 1, (val >> (i * 8)) & UINT8_MAX);
+    multiread4_check(reg, val);
+
+    /* Change the value each time to be sure writes work fine. */
+    val = 0x2b836fda;
+    /* Write using 2bytes. */
+    for ( i = 0; i < 2; i++ )
+        VPCI_WRITE_CHECK(reg + i * 2, 2, (val >> (i * 2 * 8)) & UINT16_MAX);
+    multiread4_check(reg, val);
+
+    val = 0xc4693beb;
+    VPCI_WRITE_CHECK(reg, 4, val);
+    multiread4_check(reg, val);
+}
+
+int
+main(int argc, char **argv)
+{
+    /* Index storage by offset. */
+    uint32_t r0 = 0xdeadbeef;
+    uint8_t r5 = 0xef;
+    uint8_t r6 = 0xbe;
+    uint8_t r7 = 0xef;
+    uint16_t r12 = 0x8696;
+    uint8_t r16[4] = { };
+    uint16_t r20[2] = { };
+    uint32_t r24 = 0;
+    uint8_t r28, r30;
+    unsigned int i;
+    int rc;
+
+    INIT_LIST_HEAD(&vpci.handlers);
+
+    VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
+    VPCI_READ_CHECK(0, 4, r0);
+    VPCI_WRITE_CHECK(0, 4, 0xbcbcbcbc);
+
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 5, 1, r5);
+    VPCI_READ_CHECK(5, 1, r5);
+    VPCI_WRITE_CHECK(5, 1, 0xba);
+
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 6, 1, r6);
+    VPCI_READ_CHECK(6, 1, r6);
+    VPCI_WRITE_CHECK(6, 1, 0xba);
+
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 7, 1, r7);
+    VPCI_READ_CHECK(7, 1, r7);
+    VPCI_WRITE_CHECK(7, 1, 0xbd);
+
+    VPCI_ADD_REG(vpci_read16, vpci_write16, 12, 2, r12);
+    VPCI_READ_CHECK(12, 2, r12);
+    VPCI_READ_CHECK(12, 4, 0xffff8696);
+
+    /*
+     * At this point we have the following layout:
+     *
+     * Note that this refers to the position of the variables,
+     * but the value has already changed from the one given at
+     * initialization time because write tests have been performed.
+     *
+     * 32    24    16     8     0
+     *  +-----+-----+-----+-----+
+     *  |          r0           | 0
+     *  +-----+-----+-----+-----+
+     *  | r7  |  r6 |  r5 |/////| 32
+     *  +-----+-----+-----+-----|
+     *  |///////////////////////| 64
+     *  +-----------+-----------+
+     *  |///////////|    r12    | 96
+     *  +-----------+-----------+
+     *             ...
+     *  / = empty.
+     */
+
+    /* Try to add an overlapping register handler. */
+    VPCI_ADD_INVALID_REG(vpci_read32, vpci_write32, 4, 4);
+
+    /* Try to add a non-aligned register. */
+    VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 15, 2);
+
+    /* Try to add a register with wrong size. */
+    VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 8, 3);
+
+    /* Try to add a register with missing handlers. */
+    VPCI_ADD_INVALID_REG(NULL, NULL, 8, 2);
+
+    /* Read/write of unset register. */
+    VPCI_READ_CHECK(8, 4, 0xffffffff);
+    VPCI_READ_CHECK(8, 2, 0xffff);
+    VPCI_READ_CHECK(8, 1, 0xff);
+    VPCI_WRITE(10, 2, 0xbeef);
+    VPCI_READ_CHECK(10, 2, 0xffff);
+
+    /* Read of multiple registers */
+    VPCI_WRITE_CHECK(7, 1, 0xbd);
+    VPCI_READ_CHECK(4, 4, 0xbdbabaff);
+
+    /* Partial read of a register. */
+    VPCI_WRITE_CHECK(0, 4, 0x1a1b1c1d);
+    VPCI_READ_CHECK(2, 1, 0x1b);
+    VPCI_READ_CHECK(6, 2, 0xbdba);
+
+    /* Write of multiple registers. */
+    VPCI_WRITE_CHECK(4, 4, 0xaabbccff);
+
+    /* Partial write of a register. */
+    VPCI_WRITE_CHECK(2, 1, 0xfe);
+    VPCI_WRITE_CHECK(6, 2, 0xfebc);
+
+    /*
+     * Test all possible read/write size combinations.
+     *
+     * Place 4 1B registers at 128bits (16B), 2 2B registers at 160bits
+     * (20B) and finally 1 4B register at 192bits (24B).
+     *
+     * Then perform all possible write and read sizes on each of them.
+     *
+     *               ...
+     * 32     24     16      8      0
+     *  +------+------+------+------+
+     *  |r16[3]|r16[2]|r16[1]|r16[0]| 16
+     *  +------+------+------+------+
+     *  |    r20[1]   |    r20[0]   | 20
+     *  +-------------+-------------|
+     *  |            r24            | 24
+     *  +-------------+-------------+
+     *
+     */
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 16, 1, r16[0]);
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 17, 1, r16[1]);
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 18, 1, r16[2]);
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 19, 1, r16[3]);
+
+    VPCI_ADD_REG(vpci_read16, vpci_write16, 20, 2, r20[0]);
+    VPCI_ADD_REG(vpci_read16, vpci_write16, 22, 2, r20[1]);
+
+    VPCI_ADD_REG(vpci_read32, vpci_write32, 24, 4, r24);
+
+    /* Check the initial value is 0. */
+    multiread4_check(16, 0);
+    multiread4_check(20, 0);
+    multiread4_check(24, 0);
+
+    multiwrite4_check(16);
+    multiwrite4_check(20);
+    multiwrite4_check(24);
+
+    /*
+     * Check multiple non-consecutive gaps on the same read/write:
+     *
+     * 32     24     16      8      0
+     *  +------+------+------+------+
+     *  |//////|  r30 |//////|  r28 | 28
+     *  +------+------+------+------+
+     *
+     */
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 28, 1, r28);
+    VPCI_ADD_REG(vpci_read8, vpci_write8, 30, 1, r30);
+    VPCI_WRITE_CHECK(28, 4, 0xffacffdc);
+
+    /* Finally try to remove a couple of registers. */
+    VPCI_REMOVE_REG(28, 1);
+    VPCI_REMOVE_REG(24, 4);
+    VPCI_REMOVE_REG(12, 2);
+
+    VPCI_REMOVE_INVALID_REG(20, 1);
+    VPCI_REMOVE_INVALID_REG(16, 2);
+    VPCI_REMOVE_INVALID_REG(30, 2);
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 2d54f224ec..6690516ff1 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -41,6 +41,11 @@ SECTIONS
 
   . = ALIGN(PAGE_SIZE);
   .rodata : {
+#if defined(CONFIG_HAS_PCI) && defined(CONFIG_LATE_HWDOM)
+       __start_vpci_array = .;
+       *(.rodata.vpci)
+       __end_vpci_array = .;
+#endif
         _srodata = .;          /* Read-only data */
         /* Bug frames table */
        __start_bug_frames = .;
@@ -131,6 +136,11 @@ SECTIONS
   } :text
   . = ALIGN(PAGE_SIZE);
   .init.data : {
+#if defined(CONFIG_HAS_PCI) && !defined(CONFIG_LATE_HWDOM)
+       __start_vpci_array = .;
+       *(.init.rodata.vpci)
+       __end_vpci_array = .;
+#endif
        *(.init.rodata)
        *(.init.rodata.rel)
        *(.init.rodata.str*)
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index baaf8151d2..7a862ea671 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -376,11 +376,21 @@ static bool emulation_flags_ok(const struct domain *d, 
uint32_t emflags)
     if ( is_hvm_domain(d) )
     {
         if ( is_hardware_domain(d) &&
-             emflags != (XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC) )
-            return false;
-        if ( !is_hardware_domain(d) && emflags &&
-             emflags != XEN_X86_EMU_ALL && emflags != XEN_X86_EMU_LAPIC )
+             emflags != (XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC|
+                         XEN_X86_EMU_VPCI) )
             return false;
+        if ( !is_hardware_domain(d) )
+        {
+            switch ( emflags )
+            {
+            case XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI:
+            case XEN_X86_EMU_LAPIC:
+            case 0:
+                break;
+            default:
+                return false;
+            }
+        }
     }
     else if ( emflags != 0 && emflags != XEN_X86_EMU_PIT )
     {
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 6cb903def5..cc73df8dc7 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -36,6 +36,7 @@
 #include <xen/rangeset.h>
 #include <xen/monitor.h>
 #include <xen/warning.h>
+#include <xen/vpci.h>
 #include <asm/shadow.h>
 #include <asm/hap.h>
 #include <asm/current.h>
@@ -629,6 +630,7 @@ int hvm_domain_initialise(struct domain *d, unsigned long 
domcr_flags,
         d->arch.hvm_domain.io_bitmap = hvm_io_bitmap;
 
     register_g2m_portio_handler(d);
+    register_vpci_portio_handler(d);
 
     hvm_ioreq_init(d);
 
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 074cba89da..c3b68eb257 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -25,6 +25,7 @@
 #include <xen/trace.h>
 #include <xen/event.h>
 #include <xen/hypercall.h>
+#include <xen/vpci.h>
 #include <asm/current.h>
 #include <asm/cpufeature.h>
 #include <asm/processor.h>
@@ -260,7 +261,7 @@ unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned 
int addr,
                                  unsigned int *bus, unsigned int *slot,
                                  unsigned int *func)
 {
-    unsigned long bdf;
+    unsigned int bdf;
 
     ASSERT(CF8_ENABLED(cf8));
 
@@ -275,6 +276,121 @@ unsigned int hvm_pci_decode_addr(unsigned int cf8, 
unsigned int addr,
     return CF8_ADDR_LO(cf8) | (addr & 3);
 }
 
+/* Do some sanity checks. */
+static bool vpci_access_allowed(unsigned int reg, unsigned int len)
+{
+    /* Check access size. */
+    if ( len != 1 && len != 2 && len != 4 )
+        return false;
+
+    /* Check that access is size aligned. */
+    if ( (reg & (len - 1)) )
+        return false;
+
+    return true;
+}
+
+/* vPCI config space IO ports handlers (0xcf8/0xcfc). */
+static bool vpci_portio_accept(const struct hvm_io_handler *handler,
+                               const ioreq_t *p)
+{
+    return (p->addr == 0xcf8 && p->size == 4) || (p->addr & ~3) == 0xcfc;
+}
+
+static int vpci_portio_read(const struct hvm_io_handler *handler,
+                            uint64_t addr, uint32_t size, uint64_t *data)
+{
+    struct domain *d = current->domain;
+    unsigned int bus, slot, func, reg;
+
+    *data = ~(uint64_t)0;
+
+    vpci_rlock(d);
+    if ( addr == 0xcf8 )
+    {
+        ASSERT(size == 4);
+        *data = d->arch.hvm_domain.pci_cf8;
+        vpci_runlock(d);
+        return X86EMUL_OKAY;
+    }
+    if ( !CF8_ENABLED(d->arch.hvm_domain.pci_cf8) )
+    {
+        vpci_runlock(d);
+        return X86EMUL_OKAY;
+    }
+
+    reg = hvm_pci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus, &slot,
+                              &func);
+
+    if ( !vpci_access_allowed(reg, size) )
+    {
+        vpci_runlock(d);
+        return X86EMUL_OKAY;
+    }
+
+    *data = vpci_read(0, bus, slot, func, reg, size);
+    vpci_runlock(d);
+
+    return X86EMUL_OKAY;
+}
+
+static int vpci_portio_write(const struct hvm_io_handler *handler,
+                             uint64_t addr, uint32_t size, uint64_t data)
+{
+    struct domain *d = current->domain;
+    unsigned int bus, slot, func, reg;
+
+    vpci_wlock(d);
+    if ( addr == 0xcf8 )
+    {
+        ASSERT(size == 4);
+        d->arch.hvm_domain.pci_cf8 = data;
+        vpci_wunlock(d);
+        return X86EMUL_OKAY;
+    }
+    if ( !CF8_ENABLED(d->arch.hvm_domain.pci_cf8) )
+    {
+        vpci_wunlock(d);
+        return X86EMUL_OKAY;
+    }
+
+    reg = hvm_pci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus, &slot,
+                              &func);
+
+    if ( !vpci_access_allowed(reg, size) )
+    {
+        vpci_wunlock(d);
+        return X86EMUL_OKAY;
+    }
+
+    vpci_write(0, bus, slot, func, reg, size, data);
+    vpci_wunlock(d);
+
+    return X86EMUL_OKAY;
+}
+
+static const struct hvm_io_ops vpci_portio_ops = {
+    .accept = vpci_portio_accept,
+    .read = vpci_portio_read,
+    .write = vpci_portio_write,
+};
+
+void register_vpci_portio_handler(struct domain *d)
+{
+    struct hvm_io_handler *handler;
+
+    if ( !has_vpci(d) )
+        return;
+
+    handler = hvm_next_io_handler(d);
+    if ( !handler )
+        return;
+
+    rwlock_init(&d->arch.hvm_domain.vpci_lock);
+    handler->type = IOREQ_TYPE_PIO;
+    handler->ops = &vpci_portio_ops;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index db5df6956d..5b2c0e3fc3 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1566,7 +1566,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         domcr_flags |= DOMCRF_hvm |
                        ((hvm_funcs.hap_supported && !opt_dom0_shadow) ?
                          DOMCRF_hap : 0);
-        config.emulation_flags = XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC;
+        config.emulation_flags = XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC|
+                                 XEN_X86_EMU_VPCI;
     }
 
     /* Create initial domain 0. */
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index ff08bbe42a..af1b30cb2b 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -76,6 +76,11 @@ SECTIONS
 
   __2M_rodata_start = .;       /* Start of 2M superpages, mapped RO. */
   .rodata : {
+#if defined(CONFIG_HAS_PCI) && defined(CONFIG_LATE_HWDOM)
+       __start_vpci_array = .;
+       *(.rodata.vpci)
+       __end_vpci_array = .;
+#endif
        _srodata = .;
        /* Bug frames table */
        __start_bug_frames = .;
@@ -167,6 +172,11 @@ SECTIONS
        _einittext = .;
   } :text
   .init.data : {
+#if defined(CONFIG_HAS_PCI) && !defined(CONFIG_LATE_HWDOM)
+       __start_vpci_array = .;
+       *(.init.rodata.vpci)
+       __end_vpci_array = .;
+#endif
        *(.init.rodata)
        *(.init.rodata.rel)
        *(.init.rodata.str*)
diff --git a/xen/drivers/Makefile b/xen/drivers/Makefile
index 19391802a8..d51c766453 100644
--- a/xen/drivers/Makefile
+++ b/xen/drivers/Makefile
@@ -1,6 +1,6 @@
 subdir-y += char
 subdir-$(CONFIG_HAS_CPUFREQ) += cpufreq
-subdir-$(CONFIG_HAS_PCI) += pci
+subdir-$(CONFIG_HAS_PCI) += pci vpci
 subdir-$(CONFIG_HAS_PASSTHROUGH) += passthrough
 subdir-$(CONFIG_ACPI) += acpi
 subdir-$(CONFIG_VIDEO) += video
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 27bdb7163c..54326cf0b8 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -30,6 +30,7 @@
 #include <xen/radix-tree.h>
 #include <xen/softirq.h>
 #include <xen/tasklet.h>
+#include <xen/vpci.h>
 #include <xsm/xsm.h>
 #include <asm/msi.h>
 #include "ats.h"
@@ -1030,9 +1031,10 @@ static void __hwdom_init setup_one_hwdom_device(const 
struct setup_hwdom *ctxt,
                                                 struct pci_dev *pdev)
 {
     u8 devfn = pdev->devfn;
+    int err;
 
     do {
-        int err = ctxt->handler(devfn, pdev);
+        err = ctxt->handler(devfn, pdev);
 
         if ( err )
         {
@@ -1045,6 +1047,11 @@ static void __hwdom_init setup_one_hwdom_device(const 
struct setup_hwdom *ctxt,
         devfn += pdev->phantom_stride;
     } while ( devfn != pdev->devfn &&
               PCI_SLOT(devfn) == PCI_SLOT(pdev->devfn) );
+
+    err = vpci_add_handlers(pdev);
+    if ( err )
+        printk(XENLOG_ERR "setup of vPCI for d%d failed: %d\n",
+               ctxt->d->domain_id, err);
 }
 
 static int __hwdom_init _setup_hwdom_pci_devices(struct pci_seg *pseg, void 
*arg)
diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
new file mode 100644
index 0000000000..840a906470
--- /dev/null
+++ b/xen/drivers/vpci/Makefile
@@ -0,0 +1 @@
+obj-y += vpci.o
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
new file mode 100644
index 0000000000..f63de97e89
--- /dev/null
+++ b/xen/drivers/vpci/vpci.c
@@ -0,0 +1,443 @@
+/*
+ * Generic functionality for handling accesses to the PCI configuration space
+ * from guests.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/sched.h>
+#include <xen/vpci.h>
+
+extern vpci_register_init_t *const __start_vpci_array[];
+extern vpci_register_init_t *const __end_vpci_array[];
+#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
+
+/* Internal struct to store the emulated PCI registers. */
+struct vpci_register {
+    vpci_read_t *read;
+    vpci_write_t *write;
+    unsigned int size;
+    unsigned int offset;
+    void *private;
+    struct list_head node;
+};
+
+int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
+{
+    unsigned int i;
+    int rc = 0;
+
+    if ( !has_vpci(pdev->domain) )
+        return 0;
+
+    pdev->vpci = xzalloc(struct vpci);
+    if ( !pdev->vpci )
+        return -ENOMEM;
+
+    INIT_LIST_HEAD(&pdev->vpci->handlers);
+
+    for ( i = 0; i < NUM_VPCI_INIT; i++ )
+    {
+        rc = __start_vpci_array[i](pdev);
+        if ( rc )
+            break;
+    }
+
+    if ( rc )
+    {
+        while ( !list_empty(&pdev->vpci->handlers) )
+        {
+            struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
+                                                       struct vpci_register,
+                                                       node);
+
+            list_del(&r->node);
+            xfree(r);
+        }
+        xfree(pdev->vpci);
+    }
+
+    return rc;
+}
+
+static int vpci_register_cmp(const struct vpci_register *r1,
+                             const struct vpci_register *r2)
+{
+    /* Return 0 if registers overlap. */
+    if ( r1->offset < r2->offset + r2->size &&
+         r2->offset < r1->offset + r1->size )
+        return 0;
+    if ( r1->offset < r2->offset )
+        return -1;
+    if ( r1->offset > r2->offset )
+        return 1;
+
+    ASSERT_UNREACHABLE();
+    return 0;
+}
+
+/* Dummy hooks, writes are ignored, reads return 1's */
+static uint32_t vpci_ignored_read(struct pci_dev *pdev, unsigned int reg,
+                                  const void *data)
+{
+    return ~(uint32_t)0;
+}
+
+static void vpci_ignored_write(struct pci_dev *pdev, unsigned int reg,
+                               uint32_t val, void *data)
+{
+}
+
+int vpci_add_register(const struct pci_dev *pdev, vpci_read_t *read_handler,
+                      vpci_write_t *write_handler, unsigned int offset,
+                      unsigned int size, void *data)
+{
+    struct list_head *prev;
+    struct vpci_register *r;
+
+    /* Some sanity checks. */
+    if ( (size != 1 && size != 2 && size != 4) ||
+         offset >= PCI_CFG_SPACE_EXP_SIZE || (offset & (size - 1)) ||
+         (!read_handler && !write_handler) )
+        return -EINVAL;
+
+    r = xmalloc(struct vpci_register);
+    if ( !r )
+        return -ENOMEM;
+
+    r->read = read_handler ?: vpci_ignored_read;
+    r->write = write_handler ?: vpci_ignored_write;
+    r->size = size;
+    r->offset = offset;
+    r->private = data;
+
+    vpci_wlock(pdev->domain);
+
+    /* The list of handlers must be keep sorted at all times. */
+    list_for_each ( prev, &pdev->vpci->handlers )
+    {
+        const struct vpci_register *this =
+            list_entry(prev, const struct vpci_register, node);
+        int cmp = vpci_register_cmp(r, this);
+
+        if ( cmp < 0 )
+            break;
+        if ( cmp == 0 )
+        {
+            vpci_wunlock(pdev->domain);
+            xfree(r);
+            return -EEXIST;
+        }
+    }
+
+    list_add_tail(&r->node, prev);
+    vpci_wunlock(pdev->domain);
+
+    return 0;
+}
+
+int vpci_remove_register(const struct pci_dev *pdev, unsigned int offset,
+                         unsigned int size)
+{
+    const struct vpci_register r = { .offset = offset, .size = size };
+    struct vpci_register *rm;
+
+    vpci_wlock(pdev->domain);
+    list_for_each_entry ( rm, &pdev->vpci->handlers, node )
+    {
+        int cmp = vpci_register_cmp(&r, rm);
+
+        /*
+         * NB: do not use a switch so that we can use break to
+         * get out of the list loop earlier if required.
+         */
+        if ( !cmp && rm->offset == offset && rm->size == size )
+        {
+            list_del(&rm->node);
+            vpci_wunlock(pdev->domain);
+            xfree(rm);
+            return 0;
+        }
+        if ( cmp <= 0 )
+            break;
+    }
+    vpci_wunlock(pdev->domain);
+
+    return -ENOENT;
+}
+
+/* Wrappers for performing reads/writes to the underlying hardware. */
+static uint32_t vpci_read_hw(unsigned int seg, unsigned int bus,
+                             unsigned int slot, unsigned int func,
+                             unsigned int reg, unsigned int size)
+{
+    uint32_t data;
+
+    switch ( size )
+    {
+    case 4:
+        data = pci_conf_read32(seg, bus, slot, func, reg);
+        break;
+    case 3:
+        /*
+         * This is possible because a 4byte read can have 1byte trapped and
+         * the rest passed-through.
+         */
+        if ( reg & 1 )
+        {
+            data = pci_conf_read8(seg, bus, slot, func, reg);
+            data |= pci_conf_read16(seg, bus, slot, func, reg + 1) << 8;
+        }
+        else
+        {
+            data = pci_conf_read16(seg, bus, slot, func, reg);
+            data |= pci_conf_read8(seg, bus, slot, func, reg + 2) << 16;
+        }
+        break;
+    case 2:
+        data = pci_conf_read16(seg, bus, slot, func, reg);
+        break;
+    case 1:
+        data = pci_conf_read8(seg, bus, slot, func, reg);
+        break;
+    default:
+        ASSERT_UNREACHABLE();
+        data = ~(uint32_t)0;
+        break;
+    }
+
+    return data;
+}
+
+static void vpci_write_hw(unsigned int seg, unsigned int bus,
+                          unsigned int slot, unsigned int func,
+                          unsigned int reg, unsigned int size, uint32_t data)
+{
+    switch ( size )
+    {
+    case 4:
+        pci_conf_write32(seg, bus, slot, func, reg, data);
+        break;
+    case 3:
+        /*
+         * This is possible because a 4byte write can have 1byte trapped and
+         * the rest passed-through.
+         */
+        if ( reg & 1 )
+        {
+            pci_conf_write8(seg, bus, slot, func, reg, data);
+            pci_conf_write16(seg, bus, slot, func, reg + 1, data >> 8);
+        }
+        else
+        {
+            pci_conf_write16(seg, bus, slot, func, reg, data);
+            pci_conf_write8(seg, bus, slot, func, reg + 2, data >> 16);
+        }
+        break;
+    case 2:
+        pci_conf_write16(seg, bus, slot, func, reg, data);
+        break;
+    case 1:
+        pci_conf_write8(seg, bus, slot, func, reg, data);
+        break;
+    default:
+        ASSERT_UNREACHABLE();
+        break;
+    }
+}
+
+/*
+ * Merge new data into a partial result.
+ *
+ * Zero the bytes of 'data' from [offset, offset + size), and
+ * merge the value found in 'new' from [0, offset) left shifted
+ * by 'offset'.
+ */
+static uint32_t merge_result(uint32_t data, uint32_t new, unsigned int size,
+                             unsigned int offset)
+{
+    uint32_t mask = 0xffffffff >> (32 - 8 * size);
+
+    return (data & ~(mask << (offset * 8))) | ((new & mask) << (offset * 8));
+}
+
+uint32_t vpci_read(unsigned int seg, unsigned int bus, unsigned int slot,
+                   unsigned int func, unsigned int reg, unsigned int size)
+{
+    struct domain *d = current->domain;
+    struct pci_dev *pdev;
+    const struct vpci_register *r;
+    unsigned int data_offset = 0;
+    uint32_t data = ~(uint32_t)0;
+
+    ASSERT(vpci_rlocked(d));
+
+    /* Find the PCI dev matching the address. */
+    pdev = pci_get_pdev_by_domain(d, seg, bus, PCI_DEVFN(slot, func));
+    if ( !pdev )
+        return vpci_read_hw(seg, bus, slot, func, reg, size);
+
+    /* Read from the hardware or the emulated register handlers. */
+    list_for_each_entry ( r, &pdev->vpci->handlers, node )
+    {
+        const struct vpci_register emu = {
+            .offset = reg + data_offset,
+            .size = size - data_offset
+        };
+        int cmp = vpci_register_cmp(&emu, r);
+        uint32_t val;
+        unsigned int read_size;
+
+        if ( cmp < 0 )
+            break;
+        if ( cmp > 0 )
+            continue;
+
+        if ( emu.offset < r->offset )
+        {
+            /* Heading gap, read partial content from hardware. */
+            read_size = r->offset - emu.offset;
+            val = vpci_read_hw(seg, bus, slot, func, emu.offset, read_size);
+            data = merge_result(data, val, read_size, data_offset);
+            data_offset += read_size;
+        }
+
+        val = r->read(pdev, r->offset, r->private);
+
+        /* Check if the read is in the middle of a register. */
+        if ( r->offset < emu.offset )
+            val >>= (emu.offset - r->offset) * 8;
+
+        /* Find the intersection size between the two sets. */
+        read_size = min(emu.offset + emu.size, r->offset + r->size) -
+                    max(emu.offset, r->offset);
+        /* Merge the emulated data into the native read value. */
+        data = merge_result(data, val, read_size, data_offset);
+        data_offset += read_size;
+        if ( data_offset == size )
+            break;
+        ASSERT(data_offset < size);
+    }
+
+    if ( data_offset < size )
+    {
+        /* Tailing gap, read the remaining. */
+        uint32_t tmp_data = vpci_read_hw(seg, bus, slot, func,
+                                         reg + data_offset,
+                                         size - data_offset);
+
+        data = merge_result(data, tmp_data, size - data_offset, data_offset);
+    }
+
+    return data & (0xffffffff >> (32 - 8 * size));
+}
+
+/*
+ * Perform a maybe partial write to a register.
+ *
+ * Note that this will only work for simple registers, if Xen needs to
+ * trap accesses to rw1c registers (like the status PCI header register)
+ * the logic in vpci_write will have to be expanded in order to correctly
+ * deal with them.
+ */
+static void vpci_write_helper(struct pci_dev *pdev,
+                              const struct vpci_register *r, unsigned int size,
+                              unsigned int offset, uint32_t data)
+{
+    ASSERT(size <= r->size);
+
+    if ( size != r->size )
+    {
+        uint32_t val;
+
+        val = r->read(pdev, r->offset, r->private);
+        data = merge_result(val, data, size, offset);
+    }
+
+    r->write(pdev, r->offset, data & (0xffffffff >> (32 - 8 * r->size)),
+             r->private);
+}
+
+void vpci_write(unsigned int seg, unsigned int bus, unsigned int slot,
+                unsigned int func, unsigned int reg, unsigned int size,
+                uint32_t data)
+{
+    struct domain *d = current->domain;
+    struct pci_dev *pdev;
+    const struct vpci_register *r;
+    unsigned int data_offset = 0;
+
+    ASSERT(vpci_wlocked(d));
+
+    /*
+     * Find the PCI dev matching the address.
+     * Passthrough everything that's not trapped.
+     */
+    pdev = pci_get_pdev_by_domain(d, seg, bus, PCI_DEVFN(slot, func));
+    if ( !pdev )
+    {
+        vpci_write_hw(seg, bus, slot, func, reg, size, data);
+        return;
+    }
+
+    /* Write the value to the hardware or emulated registers. */
+    list_for_each_entry ( r, &pdev->vpci->handlers, node )
+    {
+        const struct vpci_register emu = {
+            .offset = reg + data_offset,
+            .size = size - data_offset
+        };
+        int cmp = vpci_register_cmp(&emu, r);
+        unsigned int write_size;
+
+        if ( cmp < 0 )
+            break;
+        if ( cmp > 0 )
+            continue;
+
+        if ( emu.offset < r->offset )
+        {
+            /* Heading gap, write partial content to hardware. */
+            vpci_write_hw(seg, bus, slot, func, emu.offset,
+                          r->offset - emu.offset, data >> (data_offset * 8));
+            data_offset += r->offset - emu.offset;
+        }
+
+        /* Find the intersection size between the two sets. */
+        write_size = min(emu.offset + emu.size, r->offset + r->size) -
+                     max(emu.offset, r->offset);
+        vpci_write_helper(pdev, r, write_size, reg + data_offset - r->offset,
+                          data >> (data_offset * 8));
+        data_offset += write_size;
+        if ( data_offset == size )
+            break;
+        ASSERT(data_offset < size);
+    }
+
+    if ( data_offset < size )
+        /* Tailing gap, write the remaining. */
+        vpci_write_hw(seg, bus, slot, func, reg + data_offset,
+                      size - data_offset, data >> (data_offset * 8));
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index c10522b7f5..ec14343b27 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -427,6 +427,7 @@ struct arch_domain
 #define has_vpit(d)        (!!((d)->arch.emulation_flags & XEN_X86_EMU_PIT))
 #define has_pirq(d)        (!!((d)->arch.emulation_flags & \
                             XEN_X86_EMU_USE_PIRQ))
+#define has_vpci(d)        (!!((d)->arch.emulation_flags & XEN_X86_EMU_VPCI))
 
 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
 
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index d2899c9bb2..3a54d50606 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -184,6 +184,9 @@ struct hvm_domain {
     /* List of guest to machine IO ports mapping. */
     struct list_head g2m_ioport_list;
 
+    /* Lock for the PCI emulation layer (vPCI). */
+    rwlock_t vpci_lock;
+
     /* List of permanently write-mapped pages. */
     struct {
         spinlock_t lock;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 51659b6c7f..01322a2e21 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -160,6 +160,9 @@ unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned 
int addr,
  */
 void register_g2m_portio_handler(struct domain *d);
 
+/* HVM port IO handler for PCI accesses. */
+void register_vpci_portio_handler(struct domain *d);
+
 #endif /* __ASM_X86_HVM_IO_H__ */
 
 
diff --git a/xen/include/public/arch-x86/xen.h 
b/xen/include/public/arch-x86/xen.h
index f21332e897..86a1a09a8d 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -295,12 +295,15 @@ struct xen_arch_domainconfig {
 #define XEN_X86_EMU_PIT             (1U<<_XEN_X86_EMU_PIT)
 #define _XEN_X86_EMU_USE_PIRQ       9
 #define XEN_X86_EMU_USE_PIRQ        (1U<<_XEN_X86_EMU_USE_PIRQ)
+#define _XEN_X86_EMU_VPCI           10
+#define XEN_X86_EMU_VPCI            (1U<<_XEN_X86_EMU_VPCI)
 
 #define XEN_X86_EMU_ALL             (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |  \
                                      XEN_X86_EMU_PM | XEN_X86_EMU_RTC |      \
                                      XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |  \
                                      XEN_X86_EMU_VGA | XEN_X86_EMU_IOMMU |   \
-                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ)
+                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ |\
+                                     XEN_X86_EMU_VPCI)
     uint32_t emulation_flags;
 };
 
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index ea6a66b248..ad5d3ca031 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -88,6 +88,9 @@ struct pci_dev {
 #define PT_FAULT_THRESHOLD 10
     } fault;
     u64 vf_rlen[6];
+
+    /* Data for vPCI. */
+    struct vpci *vpci;
 };
 
 #define for_each_pdev(domain, pdev) \
diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
index ecd6124d91..cc4ee3b83e 100644
--- a/xen/include/xen/pci_regs.h
+++ b/xen/include/xen/pci_regs.h
@@ -23,6 +23,14 @@
 #define LINUX_PCI_REGS_H
 
 /*
+ * Conventional PCI and PCI-X Mode 1 devices have 256 bytes of
+ * configuration space.  PCI-X Mode 2 and PCIe devices have 4096 bytes of
+ * configuration space.
+ */
+#define PCI_CFG_SPACE_SIZE     256
+#define PCI_CFG_SPACE_EXP_SIZE 4096
+
+/*
  * Under PCI, each device has 256 bytes of configuration address space,
  * of which the first 64 bytes are standardized as follows:
  */
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
new file mode 100644
index 0000000000..12f7287d7b
--- /dev/null
+++ b/xen/include/xen/vpci.h
@@ -0,0 +1,80 @@
+#ifndef _VPCI_
+#define _VPCI_
+
+#include <xen/pci.h>
+#include <xen/types.h>
+#include <xen/list.h>
+
+/*
+ * Helpers for locking/unlocking.
+ *
+ * NB: the recursive variants are used so that spin_is_locked
+ * returns whether the lock is hold by the current CPU (instead
+ * of just returning whether the lock is hold by any CPU).
+ */
+#define vpci_rlock(d) read_lock(&(d)->arch.hvm_domain.vpci_lock)
+#define vpci_wlock(d) write_lock(&(d)->arch.hvm_domain.vpci_lock)
+#define vpci_runlock(d) read_unlock(&(d)->arch.hvm_domain.vpci_lock)
+#define vpci_wunlock(d) write_unlock(&(d)->arch.hvm_domain.vpci_lock)
+#define vpci_rlocked(d) rw_is_locked(&(d)->arch.hvm_domain.vpci_lock)
+#define vpci_wlocked(d) rw_is_write_locked(&(d)->arch.hvm_domain.vpci_lock)
+
+/*
+ * The vPCI handlers will never be called concurrently for the same domain, it
+ * is guaranteed that the vpci domain lock will always be locked when calling
+ * any handler.
+ */
+typedef uint32_t vpci_read_t(struct pci_dev *pdev, unsigned int reg,
+                             const void *data);
+
+typedef void vpci_write_t(struct pci_dev *pdev, unsigned int reg,
+                          uint32_t val, void *data);
+
+typedef int vpci_register_init_t(struct pci_dev *dev);
+
+#ifdef CONFIG_LATE_HWDOM
+#define VPCI_SECTION ".rodata.vpci"
+#else
+#define VPCI_SECTION ".init.rodata.vpci"
+#endif
+
+#define REGISTER_VPCI_INIT(x)                   \
+  static vpci_register_init_t *const x##_entry  \
+               __used_section(VPCI_SECTION) = x
+
+/* Add vPCI handlers to device. */
+int __must_check vpci_add_handlers(struct pci_dev *dev);
+
+/* Add/remove a register handler. */
+int __must_check vpci_add_register(const struct pci_dev *pdev,
+                                   vpci_read_t *read_handler,
+                                   vpci_write_t *write_handler,
+                                   unsigned int offset, unsigned int size,
+                                   void *data);
+int __must_check vpci_remove_register(const struct pci_dev *pdev,
+                                      unsigned int offset,
+                                      unsigned int size);
+
+/* Generic read/write handlers for the PCI config space. */
+uint32_t vpci_read(unsigned int seg, unsigned int bus, unsigned int slot,
+                   unsigned int func, unsigned int reg, unsigned int size);
+void vpci_write(unsigned int seg, unsigned int bus, unsigned int slot,
+                unsigned int func, unsigned int reg, unsigned int size,
+                uint32_t data);
+
+struct vpci {
+    /* List of vPCI handlers for a device. */
+    struct list_head handlers;
+};
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.