[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space



> -----Original Message-----
> From: Roger Pau Monne [mailto:roger.pau@xxxxxxxxxx]
> Sent: 20 April 2017 16:18
> To: xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: konrad.wilk@xxxxxxxxxx; boris.ostrovsky@xxxxxxxxxx; Roger Pau Monne
> <roger.pau@xxxxxxxxxx>; Ian Jackson <Ian.Jackson@xxxxxxxxxx>; Wei Liu
> <wei.liu2@xxxxxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>; Andrew Cooper
> <Andrew.Cooper3@xxxxxxxxxx>; Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> Subject: [PATCH v2 1/9] xen/vpci: introduce basic handlers to trap accesses
> to the PCI config space
> 
> This functionality is going to reside in vpci.c (and the corresponding vpci.h
> header), and should be arch-agnostic. The handlers introduced in this patch
> setup the basic functionality required in order to trap accesses to the PCI
> config space, and allow decoding the address and finding the corresponding
> handler that should handle the access (although no handlers are
> implemented).
> 
> Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are setup
> inside of a x86 HVM file, since that's not shared with other arches.
> 
> A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen
> whether
> a domain should use the newly introduced vPCI handlers, this is only enabled
> for PVH Dom0 at the moment.
> 
> A very simple user-space test is also provided, so that the basic 
> functionality
> of the vPCI traps can be asserted. This has been proven quite helpful during
> development, since the logic to handle partial accesses or accesses that
> expand
> across multiple registers is not trivial.
> 
> The handlers for the registers are added to a red-black tree, that indexes
> them
> based on their offset. Since Xen needs to handle partial accesses to the
> registers and access that expand across multiple registers the logic in
> xen_vpci_{read/write} is kind of convoluted, I've tried to properly comment
> it
> in order to make it easier to understand.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> ---
> Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
> Cc: Wei Liu <wei.liu2@xxxxxxxxxx>
> Cc: Jan Beulich <jbeulich@xxxxxxxx>
> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Cc: Paul Durrant <paul.durrant@xxxxxxxxxx>
> ---
> Changes since v1:
>  - Allow access to cross a word-boundary.
>  - Add locking.
>  - Add cleanup to xen_vpci_add_handlers in case of failure.
> ---
>  .gitignore                        |   4 +
>  tools/libxl/libxl_x86.c           |   2 +-
>  tools/tests/Makefile              |   1 +
>  tools/tests/vpci/Makefile         |  45 ++++
>  tools/tests/vpci/emul.h           | 107 +++++++++
>  tools/tests/vpci/main.c           | 206 +++++++++++++++++
>  xen/arch/arm/xen.lds.S            |   3 +
>  xen/arch/x86/domain.c             |  18 +-
>  xen/arch/x86/hvm/hvm.c            |   2 +
>  xen/arch/x86/hvm/io.c             | 135 +++++++++++
>  xen/arch/x86/setup.c              |   3 +-
>  xen/arch/x86/xen.lds.S            |   3 +
>  xen/drivers/Makefile              |   2 +-
>  xen/drivers/passthrough/pci.c     |   3 +
>  xen/drivers/vpci/Makefile         |   1 +
>  xen/drivers/vpci/vpci.c           | 474
> ++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/domain.h      |   1 +
>  xen/include/asm-x86/hvm/domain.h  |   3 +
>  xen/include/asm-x86/hvm/io.h      |   3 +
>  xen/include/public/arch-x86/xen.h |   5 +-
>  xen/include/xen/pci.h             |   4 +
>  xen/include/xen/vpci.h            |  66 ++++++
>  22 files changed, 1083 insertions(+), 8 deletions(-)
>  create mode 100644 tools/tests/vpci/Makefile
>  create mode 100644 tools/tests/vpci/emul.h
>  create mode 100644 tools/tests/vpci/main.c
>  create mode 100644 xen/drivers/vpci/Makefile
>  create mode 100644 xen/drivers/vpci/vpci.c
>  create mode 100644 xen/include/xen/vpci.h
> 
> diff --git a/.gitignore b/.gitignore
> index 74747cb7e7..ebafba25b5 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -236,6 +236,10 @@ tools/tests/regression/build/*
>  tools/tests/regression/downloads/*
>  tools/tests/mem-sharing/memshrtool
>  tools/tests/mce-test/tools/xen-mceinj
> +tools/tests/vpci/rbtree.[hc]
> +tools/tests/vpci/vpci.[hc]
> +tools/tests/vpci/test_vpci.out
> +tools/tests/vpci/test_vpci
>  tools/xcutils/lsevtchn
>  tools/xcutils/readnotes
>  tools/xenbackendd/_paths.h
> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index 455f6f0bed..dd7fc78a99 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -11,7 +11,7 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>      if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM) {
>          if (d_config->b_info.device_model_version !=
>              LIBXL_DEVICE_MODEL_VERSION_NONE) {
> -            xc_config->emulation_flags = XEN_X86_EMU_ALL;
> +            xc_config->emulation_flags = (XEN_X86_EMU_ALL &
> ~XEN_X86_EMU_VPCI);
>          } else if (libxl_defbool_val(d_config->b_info.u.hvm.apic)) {
>              /*
>               * HVM guests without device model may want
> diff --git a/tools/tests/Makefile b/tools/tests/Makefile
> index 639776130b..5cfe781e62 100644
> --- a/tools/tests/Makefile
> +++ b/tools/tests/Makefile
> @@ -13,6 +13,7 @@ endif
>  SUBDIRS-$(CONFIG_X86) += x86_emulator
>  SUBDIRS-y += xen-access
>  SUBDIRS-y += xenstore
> +SUBDIRS-$(CONFIG_HAS_PCI) += vpci
> 
>  .PHONY: all clean install distclean
>  all clean distclean: %: subdirs-%
> diff --git a/tools/tests/vpci/Makefile b/tools/tests/vpci/Makefile
> new file mode 100644
> index 0000000000..7969fcbd82
> --- /dev/null
> +++ b/tools/tests/vpci/Makefile
> @@ -0,0 +1,45 @@
> +
> +XEN_ROOT=$(CURDIR)/../../..
> +include $(XEN_ROOT)/tools/Rules.mk
> +
> +TARGET := test_vpci
> +
> +.PHONY: all
> +all: $(TARGET)
> +
> +.PHONY: run
> +run: $(TARGET)
> +     ./$(TARGET) > $(TARGET).out
> +
> +$(TARGET): vpci.c vpci.h rbtree.c rbtree.h
> +     $(HOSTCC) -g -o $@ vpci.c main.c rbtree.c
> +
> +.PHONY: clean
> +clean:
> +     rm -rf $(TARGET) $(TARGET).out *.o *~ vpci.h vpci.c rbtree.c rbtree.h
> +
> +.PHONY: distclean
> +distclean: clean
> +
> +.PHONY: install
> +install:
> +
> +vpci.h: $(XEN_ROOT)/xen/include/xen/vpci.h
> +     sed -e '/#include/d' <$< >$@
> +
> +vpci.c: $(XEN_ROOT)/xen/drivers/vpci/vpci.c
> +     # Trick the compiler so it doesn't complain about missing symbols
> +     sed -e '/#include/d' \
> +         -e '1s;^;#include "emul.h"\
> +                  const vpci_register_init_t __start_vpci_array[1]\;\
> +                  const vpci_register_init_t __end_vpci_array[1]\;\
> +                  ;' <$< >$@
> +
> +rbtree.h: $(XEN_ROOT)/xen/include/xen/rbtree.h
> +     sed -e '/#include/d' <$< >$@
> +
> +rbtree.c: $(XEN_ROOT)/xen/common/rbtree.c
> +     sed -e "/#include/d" \
> +         -e '1s;^;#include "emul.h"\
> +                  ;' <$< >$@
> +
> diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
> new file mode 100644
> index 0000000000..85897ed43b
> --- /dev/null
> +++ b/tools/tests/vpci/emul.h
> @@ -0,0 +1,107 @@
> +/*
> + * Unit tests for the generic vPCI handler code.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef _TEST_VPCI_
> +#define _TEST_VPCI_
> +
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <errno.h>
> +#include <assert.h>
> +
> +#define container_of(ptr, type, member) ({                      \
> +        typeof( ((type *)0)->member ) *__mptr = (ptr);          \
> +        (type *)( (char *)__mptr - offsetof(type,member) );})
> +
> +#include "rbtree.h"
> +
> +struct pci_dev {
> +    struct domain *domain;
> +    struct vpci *vpci;
> +};
> +
> +struct domain {
> +    struct pci_dev pdev;
> +};
> +
> +struct vcpu
> +{
> +    struct domain *domain;
> +};
> +
> +extern struct vcpu v;
> +
> +#define spin_lock(x)
> +#define spin_unlock(x)
> +#define spin_is_locked(x) true
> +
> +#define current (&v)
> +
> +#define has_vpci(d) true
> +
> +#include "vpci.h"
> +
> +#define xzalloc(type) (type *)calloc(1, sizeof(type))
> +#define xfree(p) free(p)
> +
> +#define EXPORT_SYMBOL(x)
> +
> +#define pci_get_pdev_by_domain(d, ...) &(d)->pdev
> +
> +#define atomic_read(x) 1
> +
> +/* Dummy native helpers. Writes are ignored, reads return 1's. */
> +#define pci_conf_read8(...) (0xff)
> +#define pci_conf_read16(...) (0xffff)
> +#define pci_conf_read32(...) (0xffffffff)
> +#define pci_conf_write8(...)
> +#define pci_conf_write16(...)
> +#define pci_conf_write32(...)
> +
> +#define BUG() assert(0)
> +#define ASSERT_UNREACHABLE() assert(0)
> +#define ASSERT(x) assert(x)
> +
> +#ifdef _LP64
> +#define BITS_PER_LONG 64
> +#else
> +#define BITS_PER_LONG 32
> +#endif
> +#define GENMASK(h, l) \
> +    (((~0UL) << (l)) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
> +
> +#define min(x,y) ({ \
> +        const typeof(x) _x = (x);       \
> +        const typeof(y) _y = (y);       \
> +        (void) (&_x == &_y);            \
> +        _x < _y ? _x : _y; })
> +
> +#endif
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> +
> diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
> new file mode 100644
> index 0000000000..0fc63de038
> --- /dev/null
> +++ b/tools/tests/vpci/main.c
> @@ -0,0 +1,206 @@
> +/*
> + * Unit tests for the generic vPCI handler code.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "emul.h"
> +
> +/* Single vcpu (current), and single domain with a single PCI device. */
> +static struct vpci vpci = {
> +    .handlers = RB_ROOT,
> +};
> +
> +static struct domain d = {
> +    .pdev.domain = &d,
> +    .pdev.vpci = &vpci,
> +};
> +
> +struct vcpu v = { .domain = &d };
> +
> +/* Dummy hooks, write stores data, read fetches it. */
> +static int vpci_read8(struct pci_dev *pdev, unsigned int reg,
> +                      union vpci_val *val, void *data)
> +{
> +    uint8_t *priv = data;
> +
> +    val->half_word = *priv;
> +    return 0;
> +}
> +
> +static int vpci_write8(struct pci_dev *pdev, unsigned int reg,
> +                       union vpci_val val, void *data)
> +{
> +    uint8_t *priv = data;
> +
> +    *priv = val.half_word;
> +    return 0;
> +}
> +
> +static int vpci_read16(struct pci_dev *pdev, unsigned int reg,
> +                       union vpci_val *val, void *data)
> +{
> +    uint16_t *priv = data;
> +
> +    val->word = *priv;
> +    return 0;
> +}
> +
> +static int vpci_write16(struct pci_dev *pdev, unsigned int reg,
> +                        union vpci_val val, void *data)
> +{
> +    uint16_t *priv = data;
> +
> +    *priv = val.word;
> +    return 0;
> +}
> +
> +static int vpci_read32(struct pci_dev *pdev, unsigned int reg,
> +                       union vpci_val *val, void *data)
> +{
> +    uint32_t *priv = data;
> +
> +    val->double_word = *priv;
> +    return 0;
> +}
> +
> +static int vpci_write32(struct pci_dev *pdev, unsigned int reg,
> +                        union vpci_val val, void *data)
> +{
> +    uint32_t *priv = data;
> +
> +    *priv = val.double_word;
> +    return 0;
> +}
> +
> +#define VPCI_READ(reg, size, data) \
> +    assert(!xen_vpci_read(0, 0, 0, reg, size, data))
> +
> +#define VPCI_READ_CHECK(reg, size, expected) ({ \
> +    uint32_t val;                               \
> +    VPCI_READ(reg, size, &val);                 \
> +    assert(val == expected);                    \
> +    })
> +
> +#define VPCI_WRITE(reg, size, data) \
> +    assert(!xen_vpci_write(0, 0, 0, reg, size, data))
> +
> +#define VPCI_CHECK_REG(reg, size, data) ({      \
> +    VPCI_WRITE(reg, size, data);                \
> +    VPCI_READ_CHECK(reg, size, data);           \
> +    })
> +
> +#define VPCI_ADD_REG(fread, fwrite, off, size, store)                        
>  \
> +    assert(!xen_vpci_add_register(&d.pdev, fread, fwrite, off, size, &store))
> \
> +
> +#define VPCI_ADD_INVALID_REG(fread, fwrite, off, size)                      \
> +    assert(xen_vpci_add_register(&d.pdev, fread, fwrite, off, size, NULL))  \
> +
> +int
> +main(int argc, char **argv)
> +{
> +    /* Index storage by offset. */
> +    uint32_t r0 = 0xdeadbeef;
> +    uint8_t r5 = 0xef;
> +    uint8_t r6 = 0xbe;
> +    uint8_t r7 = 0xef;
> +    uint16_t r12 = 0x8696;
> +    int rc;
> +
> +    VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
> +    VPCI_READ_CHECK(0, 4, 0xdeadbeef);
> +    VPCI_CHECK_REG(0, 4, 0xbcbcbcbc);
> +
> +    VPCI_ADD_REG(vpci_read8, vpci_write8, 5, 1, r5);
> +    VPCI_READ_CHECK(5, 1, 0xef);
> +    VPCI_CHECK_REG(5, 1, 0xba);
> +
> +    VPCI_ADD_REG(vpci_read8, vpci_write8, 6, 1, r6);
> +    VPCI_READ_CHECK(6, 1, 0xbe);
> +    VPCI_CHECK_REG(6, 1, 0xba);
> +
> +    VPCI_ADD_REG(vpci_read8, vpci_write8, 7, 1, r7);
> +    VPCI_READ_CHECK(7, 1, 0xef);
> +    VPCI_CHECK_REG(7, 1, 0xbd);
> +
> +    VPCI_ADD_REG(vpci_read16, vpci_write16, 12, 2, r12);
> +    VPCI_READ_CHECK(12, 2, 0x8696);
> +    VPCI_READ_CHECK(12, 4, 0xffff8696);
> +
> +    /*
> +     * At this point we have the following layout:
> +     *
> +     * 32    24    16     8     0
> +     *  +-----+-----+-----+-----+
> +     *  |          r0           | 0
> +     *  +-----+-----+-----+-----+
> +     *  | r7  |  r6 |  r5 |/////| 32
> +     *  +-----+-----+-----+-----|
> +     *  |///////////////////////| 64
> +     *  +-----------+-----------+
> +     *  |///////////|    r12    | 96
> +     *  +-----------+-----------+
> +     *             ...
> +     *  / = empty.
> +     */
> +
> +    /* Try to add an overlapping register handler. */
> +    VPCI_ADD_INVALID_REG(vpci_read32, vpci_write32, 4, 4);
> +
> +    /* Try to add a non-aligned register. */
> +    VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 15, 2);
> +
> +    /* Try to add a register with wrong size. */
> +    VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 8, 3);
> +
> +    /* Try to add a register with missing handlers. */
> +    VPCI_ADD_INVALID_REG(vpci_read16, NULL, 8, 2);
> +    VPCI_ADD_INVALID_REG(NULL, vpci_write16, 8, 2);
> +
> +    /* Read/write of unset register. */
> +    VPCI_READ_CHECK(8, 4, 0xffffffff);
> +    VPCI_READ_CHECK(8, 2, 0xffff);
> +    VPCI_READ_CHECK(8, 1, 0xff);
> +    VPCI_WRITE(10, 2, 0xbeef);
> +    VPCI_READ_CHECK(10, 2, 0xffff);
> +
> +    /* Read of multiple registers */
> +    VPCI_CHECK_REG(7, 1, 0xbd);
> +    VPCI_READ_CHECK(4, 4, 0xbdbabaff);
> +
> +    /* Partial read of a register. */
> +    VPCI_CHECK_REG(0, 4, 0x1a1b1c1d);
> +    VPCI_READ_CHECK(2, 1, 0x1b);
> +    VPCI_READ_CHECK(6, 2, 0xbdba);
> +
> +    /* Write of multiple registers. */
> +    VPCI_CHECK_REG(4, 4, 0xaabbccff);
> +
> +    /* Partial write of a register. */
> +    VPCI_CHECK_REG(2, 1, 0xfe);
> +    VPCI_CHECK_REG(6, 2, 0xfebc);
> +
> +    return 0;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> +
> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> index 44bd3bf0ce..41bf9dfaf3 100644
> --- a/xen/arch/arm/xen.lds.S
> +++ b/xen/arch/arm/xen.lds.S
> @@ -79,6 +79,9 @@ SECTIONS
>         __start_schedulers_array = .;
>         *(.data.schedulers)
>         __end_schedulers_array = .;
> +       __start_vpci_array = .;
> +       *(.data.vpci)
> +       __end_vpci_array = .;
>         *(.data.rel)
>         *(.data.rel.*)
>         CONSTRUCTORS
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 90e2b1f82a..f74020facc 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -500,11 +500,21 @@ static bool emulation_flags_ok(const struct domain
> *d, uint32_t emflags)
>      if ( is_hvm_domain(d) )
>      {
>          if ( is_hardware_domain(d) &&
> -             emflags != (XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC) )
> -            return false;
> -        if ( !is_hardware_domain(d) && emflags &&
> -             emflags != XEN_X86_EMU_ALL && emflags != XEN_X86_EMU_LAPIC )
> +             emflags != (XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC|
> +                         XEN_X86_EMU_VPCI) )
>              return false;
> +        if ( !is_hardware_domain(d) )
> +        {
> +            switch ( emflags )
> +            {
> +            case XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI:
> +            case XEN_X86_EMU_LAPIC:
> +            case 0:
> +                break;
> +            default:
> +                return false;
> +            }
> +        }
>      }
>      else if ( emflags != 0 && emflags != XEN_X86_EMU_PIT )
>      {
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index a441955322..7f3322ede6 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -37,6 +37,7 @@
>  #include <xen/vm_event.h>
>  #include <xen/monitor.h>
>  #include <xen/warning.h>
> +#include <xen/vpci.h>
>  #include <asm/shadow.h>
>  #include <asm/hap.h>
>  #include <asm/current.h>
> @@ -655,6 +656,7 @@ int hvm_domain_initialise(struct domain *d)
>          d->arch.hvm_domain.io_bitmap = hvm_io_bitmap;
> 
>      register_g2m_portio_handler(d);
> +    register_vpci_portio_handler(d);
> 
>      hvm_ioreq_init(d);
> 
> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> index 214ab307c4..15048da556 100644
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -25,6 +25,7 @@
>  #include <xen/trace.h>
>  #include <xen/event.h>
>  #include <xen/hypercall.h>
> +#include <xen/vpci.h>
>  #include <asm/current.h>
>  #include <asm/cpufeature.h>
>  #include <asm/processor.h>
> @@ -256,6 +257,140 @@ void register_g2m_portio_handler(struct domain
> *d)
>      handler->ops = &g2m_portio_ops;
>  }
> 
> +/* Do some sanity checks. */
> +static int vpci_access_check(unsigned int reg, unsigned int len)
> +{
> +    /* Check access size. */
> +    if ( len != 1 && len != 2 && len != 4 )
> +    {
> +        gdprintk(XENLOG_WARNING, "invalid length (reg: %#x, len: %u)\n",
> +                 reg, len);
> +        return -EINVAL;
> +    }
> +
> +    /* Check if access crosses a double-word boundary. */
> +    if ( (reg & 3) + len > 4 )
> +    {
> +        gdprintk(XENLOG_WARNING,
> +                 "invalid access across double-word boundary (reg: %#x, len:
> %u)\n",
> +                 reg, len);
> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Helper to decode a PCI address. */
> +static void vpci_decode_addr(unsigned int cf8, unsigned int addr,
> +                             unsigned int *bus, unsigned int *devfn,
> +                             unsigned int *reg)
> +{
> +    unsigned long bdf;
> +
> +    bdf = CF8_BDF(cf8);
> +    *bus = PCI_BUS(bdf);
> +    *devfn = PCI_DEVFN(PCI_SLOT(bdf), PCI_FUNC(bdf));
> +    /*
> +     * NB: the lower 2 bits of the register address are fetched from the
> +     * offset into the 0xcfc register when reading/writing to it.
> +     */
> +    *reg = (cf8 & 0xfc) | (addr & 3);
> +}
> +
> +/* vPCI config space IO ports handlers (0xcf8/0xcfc). */
> +static bool_t vpci_portio_accept(const struct hvm_io_handler *handler,
> +                                 const ioreq_t *p)
> +{
> +    return (p->addr == 0xcf8 && p->size == 4) || (p->addr & 0xfffc) == 0xcfc;
> +}
> +
> +static int vpci_portio_read(const struct hvm_io_handler *handler,
> +                            uint64_t addr, uint32_t size, uint64_t *data)
> +{
> +    struct domain *d = current->domain;
> +    unsigned int bus, devfn, reg;
> +    uint32_t data32;
> +    int rc;
> +
> +    vpci_lock(d);
> +    if ( addr == 0xcf8 )
> +    {
> +        ASSERT(size == 4);
> +        *data = d->arch.hvm_domain.pci_cf8;
> +        vpci_unlock(d);
> +        return X86EMUL_OKAY;
> +    }
> +
> +    /* Decode the PCI address. */
> +    vpci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus, &devfn,
> &reg);
> +
> +    if ( vpci_access_check(reg, size) || reg >= 0xff )
> +    {
> +        vpci_unlock(d);
> +        return X86EMUL_UNHANDLEABLE;
> +    }
> +
> +    rc = xen_vpci_read(0, bus, devfn, reg, size, &data32);
> +    if ( !rc )
> +        *data = data32;
> +    vpci_unlock(d);
> +
> +     return rc ? X86EMUL_UNHANDLEABLE : X86EMUL_OKAY;
> +}
> +
> +static int vpci_portio_write(const struct hvm_io_handler *handler,
> +                             uint64_t addr, uint32_t size, uint64_t data)
> +{
> +    struct domain *d = current->domain;
> +    unsigned int bus, devfn, reg;
> +    int rc;
> +
> +    vpci_lock(d);
> +    if ( addr == 0xcf8 )
> +    {
> +        ASSERT(size == 4);
> +        d->arch.hvm_domain.pci_cf8 = data;
> +        vpci_unlock(d);
> +        return X86EMUL_OKAY;
> +    }
> +
> +    /* Decode the PCI address. */
> +    vpci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus, &devfn,
> &reg);
> +
> +    if ( vpci_access_check(reg, size) || reg >= 0xff )
> +    {
> +        vpci_unlock(d);
> +        return X86EMUL_UNHANDLEABLE;
> +    }
> +
> +    rc = xen_vpci_write(0, bus, devfn, reg, size, data);
> +    vpci_unlock(d);
> +
> +    return rc ? X86EMUL_UNHANDLEABLE : X86EMUL_OKAY;
> +}
> +
> +static const struct hvm_io_ops vpci_portio_ops = {
> +    .accept = vpci_portio_accept,
> +    .read = vpci_portio_read,
> +    .write = vpci_portio_write,
> +};
> +
> +void register_vpci_portio_handler(struct domain *d)
> +{
> +    struct hvm_io_handler *handler;
> +
> +    if ( !has_vpci(d) )
> +        return;
> +
> +    handler = hvm_next_io_handler(d);
> +    if ( !handler )
> +        return;
> +
> +    spin_lock_init(&d->arch.hvm_domain.vpci_lock);
> +    handler->type = IOREQ_TYPE_PIO;
> +    handler->ops = &vpci_portio_ops;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index f7b927858c..4cf919f206 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1566,7 +1566,8 @@ void __init noreturn __start_xen(unsigned long
> mbi_p)
>          domcr_flags |= DOMCRF_hvm |
>                         ((hvm_funcs.hap_supported && !opt_dom0_shadow) ?
>                           DOMCRF_hap : 0);
> -        config.emulation_flags =
> XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC;
> +        config.emulation_flags =
> XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC|
> +                                 XEN_X86_EMU_VPCI;
>      }
> 
>      /* Create initial domain 0. */
> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
> index 8289a1bf09..f5cc8e2b8d 100644
> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -224,6 +224,9 @@ SECTIONS
>         __start_schedulers_array = .;
>         *(.data.schedulers)
>         __end_schedulers_array = .;
> +       __start_vpci_array = .;
> +       *(.data.vpci)
> +       __end_vpci_array = .;
>         *(.data.rel.ro)
>         *(.data.rel.ro.*)
>    } :text
> diff --git a/xen/drivers/Makefile b/xen/drivers/Makefile
> index 19391802a8..d51c766453 100644
> --- a/xen/drivers/Makefile
> +++ b/xen/drivers/Makefile
> @@ -1,6 +1,6 @@
>  subdir-y += char
>  subdir-$(CONFIG_HAS_CPUFREQ) += cpufreq
> -subdir-$(CONFIG_HAS_PCI) += pci
> +subdir-$(CONFIG_HAS_PCI) += pci vpci
>  subdir-$(CONFIG_HAS_PASSTHROUGH) += passthrough
>  subdir-$(CONFIG_ACPI) += acpi
>  subdir-$(CONFIG_VIDEO) += video
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index c8e2d2d9a9..2288cf8814 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -30,6 +30,7 @@
>  #include <xen/radix-tree.h>
>  #include <xen/softirq.h>
>  #include <xen/tasklet.h>
> +#include <xen/vpci.h>
>  #include <xsm/xsm.h>
>  #include <asm/msi.h>
>  #include "ats.h"
> @@ -1041,6 +1042,8 @@ static void setup_one_hwdom_device(const struct
> setup_hwdom *ctxt,
>          devfn += pdev->phantom_stride;
>      } while ( devfn != pdev->devfn &&
>                PCI_SLOT(devfn) == PCI_SLOT(pdev->devfn) );
> +
> +    xen_vpci_add_handlers(pdev);
>  }
> 
>  static int __hwdom_init _setup_hwdom_pci_devices(struct pci_seg *pseg,
> void *arg)
> diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
> new file mode 100644
> index 0000000000..840a906470
> --- /dev/null
> +++ b/xen/drivers/vpci/Makefile
> @@ -0,0 +1 @@
> +obj-y += vpci.o
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> new file mode 100644
> index 0000000000..f4cd04f11d
> --- /dev/null
> +++ b/xen/drivers/vpci/vpci.c
> @@ -0,0 +1,474 @@
> +/*
> + * Generic functionality for handling accesses to the PCI configuration space
> + * from guests.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/sched.h>
> +#include <xen/vpci.h>
> +
> +extern const vpci_register_init_t __start_vpci_array[], __end_vpci_array[];
> +#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
> +#define vpci_init __start_vpci_array
> +
> +/* Helpers for locking/unlocking. */
> +#define vpci_lock(d) spin_lock(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_unlock(d) spin_unlock(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_locked(d) spin_is_locked(&(d)->arch.hvm_domain.vpci_lock)
> +
> +/* Internal struct to store the emulated PCI registers. */
> +struct vpci_register {
> +    vpci_read_t read;
> +    vpci_write_t write;
> +    unsigned int size;
> +    unsigned int offset;
> +    void *priv_data;
> +    struct rb_node node;
> +};
> +
> +int xen_vpci_add_handlers(struct pci_dev *pdev)
> +{
> +    int i, rc = 0;
> +
> +    if ( !has_vpci(pdev->domain) )
> +        return 0;
> +
> +    pdev->vpci = xzalloc(struct vpci);
> +    if ( !pdev->vpci )
> +        return -ENOMEM;
> +
> +    pdev->vpci->handlers = RB_ROOT;
> +
> +    for ( i = 0; i < NUM_VPCI_INIT; i++ )
> +    {
> +        rc = vpci_init[i](pdev);
> +        if ( rc )
> +            break;
> +    }
> +
> +    if ( rc )
> +    {
> +        struct rb_node *node = rb_first(&pdev->vpci->handlers);
> +        struct vpci_register *r;
> +
> +        /* Iterate over the tree and cleanup. */
> +        while ( node != NULL )
> +        {
> +            r = container_of(node, struct vpci_register, node);
> +            node = rb_next(node);
> +            rb_erase(&r->node, &pdev->vpci->handlers);
> +            xfree(r);
> +        }
> +        xfree(pdev->vpci);
> +    }
> +
> +    return rc;
> +}
> +
> +static bool vpci_register_overlap(const struct vpci_register *r,
> +                                  unsigned int offset)
> +{
> +    if ( offset >= r->offset && offset < r->offset + r->size )
> +        return true;
> +
> +    return false;
> +}
> +
> +
> +static int vpci_register_cmp(const struct vpci_register *r1,
> +                             const struct vpci_register *r2)
> +{
> +    /* Make sure there's no overlap between registers. */
> +    if ( vpci_register_overlap(r1, r2->offset) ||
> +         vpci_register_overlap(r1, r2->offset + r2->size - 1) ||
> +         vpci_register_overlap(r2, r1->offset) ||
> +         vpci_register_overlap(r2, r1->offset + r1->size - 1) )
> +        return 0;
> +
> +    if (r1->offset < r2->offset)
> +        return -1;
> +    else if (r1->offset > r2->offset)
> +        return 1;
> +
> +    ASSERT_UNREACHABLE();
> +    return 0;
> +}
> +
> +static struct vpci_register *vpci_find_register(const struct pci_dev *pdev,
> +                                                const unsigned int reg,
> +                                                const unsigned int size)
> +{
> +    struct rb_node *node;
> +    struct vpci_register r = {
> +        .offset = reg,
> +        .size = size,
> +    };
> +
> +    ASSERT(vpci_locked(pdev->domain));
> +
> +    node = pdev->vpci->handlers.rb_node;
> +    while ( node )
> +    {
> +        struct vpci_register *t =
> +            container_of(node, struct vpci_register, node);
> +
> +        switch ( vpci_register_cmp(&r, t) )
> +        {
> +        case -1:
> +            node = node->rb_left;
> +            break;
> +        case 1:
> +            node = node->rb_right;
> +            break;
> +        default:
> +            return t;
> +        }
> +    }
> +
> +    return NULL;
> +}
> +
> +int xen_vpci_add_register(struct pci_dev *pdev, vpci_read_t
> read_handler,
> +                          vpci_write_t write_handler, unsigned int offset,
> +                          unsigned int size, void *data)
> +{
> +    struct rb_node **new, *parent;
> +    struct vpci_register *r;
> +
> +    /* Some sanity checks. */
> +    if ( (size != 1 && size != 2 && size != 4) || offset >= 0xFFF ||
> +         offset & (size - 1) || read_handler == NULL || write_handler == 
> NULL )
> +        return -EINVAL;
> +
> +    r = xzalloc(struct vpci_register);
> +    if ( !r )
> +        return -ENOMEM;
> +
> +    r->read = read_handler;
> +    r->write = write_handler;
> +    r->size = size;
> +    r->offset = offset;
> +    r->priv_data = data;
> +
> +    vpci_lock(pdev->domain);
> +    new = &pdev->vpci->handlers.rb_node;
> +    parent = NULL;
> +
> +    while (*new) {
> +        struct vpci_register *this =
> +            container_of(*new, struct vpci_register, node);
> +
> +        parent = *new;
> +        switch ( vpci_register_cmp(r, this) )
> +        {
> +        case -1:
> +            new = &((*new)->rb_left);
> +            break;
> +        case 1:
> +            new = &((*new)->rb_right);
> +            break;
> +        default:
> +            xfree(r);
> +            vpci_unlock(pdev->domain);
> +            return -EEXIST;
> +        }
> +    }
> +
> +    rb_link_node(&r->node, parent, new);
> +    rb_insert_color(&r->node, &pdev->vpci->handlers);
> +    vpci_unlock(pdev->domain);
> +
> +    return 0;
> +}
> +
> +int xen_vpci_remove_register(struct pci_dev *pdev, unsigned int offset)
> +{
> +    struct vpci_register *r;
> +
> +    vpci_lock(pdev->domain);
> +    r = vpci_find_register(pdev, offset, 1 /* size doesn't matter here. */);
> +    if ( !r )
> +    {
> +        vpci_unlock(pdev->domain);
> +        return -ENOENT;
> +    }
> +
> +    rb_erase(&r->node, &pdev->vpci->handlers);
> +    xfree(r);
> +    vpci_unlock(pdev->domain);
> +
> +    return 0;
> +}
> +
> +/* Wrappers for performing reads/writes to the underlying hardware. */
> +static void vpci_read_hw(unsigned int seg, unsigned int bus,
> +                         unsigned int devfn, unsigned int reg, uint32_t size,
> +                         uint32_t *data)
> +{
> +    switch ( size )
> +    {
> +    case 4:
> +        *data = pci_conf_read32(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                                reg);
> +        break;
> +    case 3:
> +        /*
> +         * This is possible because a 4byte read can have 1byte trapped and
> +         * the rest passed-through.
> +         */
> +        *data = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                                reg + 1) << 8;
> +        *data |= pci_conf_read8(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                               reg);
> +        break;
> +    case 2:
> +        *data = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                                reg);
> +        break;
> +    case 1:
> +        *data = pci_conf_read8(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                               reg);
> +        break;
> +    default:
> +        BUG();
> +    }
> +}
> +
> +static void vpci_write_hw(unsigned int seg, unsigned int bus,
> +                          unsigned int devfn, unsigned int reg, uint32_t 
> size,
> +                          uint32_t data)
> +{
> +    switch ( size )
> +    {
> +    case 4:
> +        pci_conf_write32(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), reg,
> +                         data);
> +        break;
> +    case 3:
> +        /*
> +         * This is possible because a 4byte write can have 1byte trapped and
> +         * the rest passed-through.
> +         */
> +        pci_conf_write8(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), reg,
> data);
> +        pci_conf_write16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), reg +
> 1,
> +                         data >> 8);
> +        break;
> +    case 2:
> +        pci_conf_write16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), reg,
> +                         data);
> +        break;
> +    case 1:
> +        pci_conf_write8(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), reg,
> data);
> +        break;
> +    default:
> +        BUG();
> +    }
> +}
> +
> +/* Helper macros for the read/write handlers. */
> +#define GENMASK_BYTES(e, s) GENMASK((e) * 8, (s) * 8)
> +#define SHIFT_RIGHT_BYTES(d, o) d >>= (o) * 8
> +#define ADD_RESULT(r, d, s, o) r |= ((d) & GENMASK_BYTES(s, 0)) << ((o) *
> 8)
> +
> +int xen_vpci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
> +                  unsigned int reg, uint32_t size, uint32_t *data)
> +{
> +    struct domain *d = current->domain;
> +    struct pci_dev *pdev;
> +    const struct vpci_register *r;
> +    union vpci_val val = { .double_word = 0 };
> +    unsigned int data_rshift = 0, data_lshift = 0, data_size;
> +    uint32_t tmp_data;
> +    int rc;
> +
> +    ASSERT(vpci_locked(d));
> +
> +    *data = 0;
> +
> +    /* Find the PCI dev matching the address. */
> +    pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
> +    if ( !pdev )
> +        goto passthrough;

I hope this can eventually be generalised so I wonder what your intention is 
regarding co-existence between Xen emulated PCI config space, pass-through and 
PCI devices emulated externally. We already have a framework for registering 
PCI devices by SBDF but this code seems to make no use of it, which I suspect 
is likely to cause future conflict.

  Paul

> +
> +    /* Find the vPCI register handler. */
> +    r = vpci_find_register(pdev, reg, size);
> +    if ( !r )
> +        goto passthrough;
> +
> +    if ( r->offset > reg )
> +    {
> +        /*
> +         * There's a heading gap into the emulated register.
> +         * NB: it's possible for this recursive call to have a size of 3.
> +         */
> +        rc = xen_vpci_read(seg, bus, devfn, reg, r->offset - reg, &tmp_data);
> +        if ( rc )
> +            return rc;
> +
> +        /* Add the head read to the partial result. */
> +        ADD_RESULT(*data, tmp_data, r->offset - reg, 0);
> +        data_lshift = r->offset - reg;
> +
> +        /* Account for the read. */
> +        size -= data_lshift;
> +        reg += data_lshift;
> +    }
> +    else if ( r->offset < reg )
> +        /* There's an offset into the emulated register */
> +        data_rshift = reg - r->offset;
> +
> +    ASSERT(data_lshift == 0 || data_rshift == 0);
> +    data_size = min(size, r->size - data_rshift);
> +    ASSERT(data_size != 0);
> +
> +    /* Perform the read of the register. */
> +    rc = r->read(pdev, r->offset, &val, r->priv_data);
> +    if ( rc )
> +        return rc;
> +
> +    val.double_word >>= data_rshift * 8;
> +    ADD_RESULT(*data, val.double_word, data_size, data_lshift);
> +
> +    /* Account for the read */
> +    size -= data_size;
> +    reg += data_size;
> +
> +    /* Read the remaining, if any. */
> +    if ( size > 0 )
> +    {
> +        /*
> +         * Read tailing data.
> +         * NB: it's possible for this recursive call to have a size of 3.
> +         */
> +        rc = xen_vpci_read(seg, bus, devfn, reg, size, &tmp_data);
> +        if ( rc )
> +            return rc;
> +
> +        /* Add the tail read to the partial result. */
> +        ADD_RESULT(*data, tmp_data, size, data_size + data_lshift);
> +    }
> +
> +    return 0;
> +
> + passthrough:
> +    vpci_read_hw(seg, bus, devfn, reg, size, data);
> +    return 0;
> +}
> +
> +/* Perform a maybe partial write to a register. */
> +static int vpci_write_helper(struct pci_dev *pdev,
> +                             const struct vpci_register *r, unsigned int 
> size,
> +                             unsigned int offset, uint32_t data)
> +{
> +    union vpci_val val = { .double_word = data };
> +    int rc;
> +
> +    ASSERT(size <= r->size);
> +    if ( size != r->size )
> +    {
> +        rc = r->read(pdev, r->offset, &val, r->priv_data);
> +        if ( rc )
> +            return rc;
> +        val.double_word &= ~GENMASK_BYTES(size + offset, offset);
> +        data &= GENMASK_BYTES(size, 0);
> +        val.double_word |= data << (offset * 8);
> +    }
> +
> +    return r->write(pdev, r->offset, val, r->priv_data);
> +}
> +
> +int xen_vpci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
> +                   unsigned int reg, uint32_t size, uint32_t data)
> +{
> +    struct domain *d = current->domain;
> +    struct pci_dev *pdev;
> +    const struct vpci_register *r;
> +    unsigned int data_size, data_offset = 0;
> +    int rc;
> +
> +    ASSERT(vpci_locked(d));
> +
> +    /* Find the PCI dev matching the address. */
> +    pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
> +    if ( !pdev )
> +        goto passthrough;
> +
> +    /* Find the vPCI register handler. */
> +    r = vpci_find_register(pdev, reg, size);
> +    if ( !r )
> +        goto passthrough;
> +
> +    else if ( r->offset > reg )
> +    {
> +        /*
> +         * There's a heading gap into the emulated register found.
> +         * NB: it's possible for this recursive call to have a size of 3.
> +         */
> +        rc = xen_vpci_write(seg, bus, devfn, reg, r->offset - reg, data);
> +        if ( rc )
> +            return rc;
> +
> +        /* Advance the data by the written size. */
> +        SHIFT_RIGHT_BYTES(data, r->offset - reg);
> +        size -= r->offset - reg;
> +        reg += r->offset - reg;
> +    }
> +    else if ( r->offset < reg )
> +        /* There's an offset into the emulated register. */
> +        data_offset = reg - r->offset;
> +
> +    data_size = min(size, r->size - data_offset);
> +
> +    /* Perform the write of the register. */
> +    ASSERT(data_size != 0);
> +    rc = vpci_write_helper(pdev, r, data_size, data_offset, data);
> +    if ( rc )
> +        return rc;
> +
> +    /* Account for the read */
> +    size -= data_size;
> +    reg += data_size;
> +    SHIFT_RIGHT_BYTES(data, data_size);
> +
> +    /* Write the remaining, if any. */
> +    if ( size > 0 )
> +    {
> +        /*
> +         * Write tailing data.
> +         * NB: it's possible for this recursive call to have a size of 3.
> +         */
> +        rc = xen_vpci_write(seg, bus, devfn, reg, size, data);
> +        if ( rc )
> +            return rc;
> +    }
> +
> +    return 0;
> +
> + passthrough:
> +    vpci_write_hw(seg, bus, devfn, reg, size, data);
> +    return 0;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> +
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index 6ab987f231..f0741917ed 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -426,6 +426,7 @@ struct arch_domain
>  #define has_vpit(d)        (!!((d)->arch.emulation_flags &
> XEN_X86_EMU_PIT))
>  #define has_pirq(d)        (!!((d)->arch.emulation_flags & \
>                              XEN_X86_EMU_USE_PIRQ))
> +#define has_vpci(d)        (!!((d)->arch.emulation_flags &
> XEN_X86_EMU_VPCI))
> 
>  #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
> 
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-
> x86/hvm/domain.h
> index d2899c9bb2..cbf4170789 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -184,6 +184,9 @@ struct hvm_domain {
>      /* List of guest to machine IO ports mapping. */
>      struct list_head g2m_ioport_list;
> 
> +    /* Lock for the PCI emulation layer (vPCI). */
> +    spinlock_t vpci_lock;
> +
>      /* List of permanently write-mapped pages. */
>      struct {
>          spinlock_t lock;
> diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
> index 2484eb1c75..2dbf92f13e 100644
> --- a/xen/include/asm-x86/hvm/io.h
> +++ b/xen/include/asm-x86/hvm/io.h
> @@ -155,6 +155,9 @@ extern void hvm_dpci_msi_eoi(struct domain *d, int
> vector);
>   */
>  void register_g2m_portio_handler(struct domain *d);
> 
> +/* HVM port IO handler for PCI accesses. */
> +void register_vpci_portio_handler(struct domain *d);
> +
>  #endif /* __ASM_X86_HVM_IO_H__ */
> 
> 
> diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-
> x86/xen.h
> index 8a9ba7982b..c00f8cda93 100644
> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -295,12 +295,15 @@ struct xen_arch_domainconfig {
>  #define XEN_X86_EMU_PIT             (1U<<_XEN_X86_EMU_PIT)
>  #define _XEN_X86_EMU_USE_PIRQ       9
>  #define XEN_X86_EMU_USE_PIRQ        (1U<<_XEN_X86_EMU_USE_PIRQ)
> +#define _XEN_X86_EMU_VPCI           10
> +#define XEN_X86_EMU_VPCI            (1U<<_XEN_X86_EMU_VPCI)
> 
>  #define XEN_X86_EMU_ALL             (XEN_X86_EMU_LAPIC |
> XEN_X86_EMU_HPET |  \
>                                       XEN_X86_EMU_PM | XEN_X86_EMU_RTC |      
> \
>                                       XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |  
> \
>                                       XEN_X86_EMU_VGA | XEN_X86_EMU_IOMMU |   
> \
> -                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ)
> +                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ 
> |\
> +                                     XEN_X86_EMU_VPCI)
>      uint32_t emulation_flags;
>  };
> 
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 59b6e8a81c..a83c4a1276 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -13,6 +13,7 @@
>  #include <xen/irq.h>
>  #include <xen/pci_regs.h>
>  #include <xen/pfn.h>
> +#include <xen/rbtree.h>
>  #include <asm/device.h>
>  #include <asm/numa.h>
>  #include <asm/pci.h>
> @@ -88,6 +89,9 @@ struct pci_dev {
>  #define PT_FAULT_THRESHOLD 10
>      } fault;
>      u64 vf_rlen[6];
> +
> +    /* Data for vPCI. */
> +    struct vpci *vpci;
>  };
> 
>  #define for_each_pdev(domain, pdev) \
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> new file mode 100644
> index 0000000000..56e8d1c35e
> --- /dev/null
> +++ b/xen/include/xen/vpci.h
> @@ -0,0 +1,66 @@
> +#ifndef _VPCI_
> +#define _VPCI_
> +
> +#include <xen/pci.h>
> +#include <xen/types.h>
> +
> +/* Helpers for locking/unlocking. */
> +#define vpci_lock(d) spin_lock(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_unlock(d) spin_unlock(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_locked(d) spin_is_locked(&(d)->arch.hvm_domain.vpci_lock)
> +
> +/* Value read or written by the handlers. */
> +union vpci_val {
> +    uint8_t half_word;
> +    uint16_t word;
> +    uint32_t double_word;
> +};
> +
> +/*
> + * The vPCI handlers will never be called concurrently for the same domain,
> ii
> + * is guaranteed that the vpci domain lock will always be locked when calling
> + * any handler.
> + */
> +typedef int (*vpci_read_t)(struct pci_dev *pdev, unsigned int reg,
> +                           union vpci_val *val, void *data);
> +
> +typedef int (*vpci_write_t)(struct pci_dev *pdev, unsigned int reg,
> +                            union vpci_val val, void *data);
> +
> +typedef int (*vpci_register_init_t)(struct pci_dev *dev);
> +
> +#define REGISTER_VPCI_INIT(x) \
> +  static const vpci_register_init_t x##_entry __used_section(".data.vpci") =
> x
> +
> +/* Add vPCI handlers to device. */
> +int xen_vpci_add_handlers(struct pci_dev *dev);
> +
> +/* Add/remove a register handler. */
> +int xen_vpci_add_register(struct pci_dev *pdev, vpci_read_t
> read_handler,
> +                          vpci_write_t write_handler, unsigned int offset,
> +                          unsigned int size, void *data);
> +int xen_vpci_remove_register(struct pci_dev *pdev, unsigned int offset);
> +
> +/* Generic read/write handlers for the PCI config space. */
> +int xen_vpci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
> +                  unsigned int reg, uint32_t size, uint32_t *data);
> +int xen_vpci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
> +                   unsigned int reg, uint32_t size, uint32_t data);
> +
> +struct vpci {
> +    /* Root pointer for the tree of vPCI handlers. */
> +    struct rb_root handlers;
> +};
> +
> +#endif
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> +
> --
> 2.11.0 (Apple Git-81)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.