[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 02/11] vpci: introduce basic handlers to trap accesses to the PCI config space



> -----Original Message-----
> From: Roger Pau Monne [mailto:roger.pau@xxxxxxxxxx]
> Sent: 14 August 2017 15:29
> To: xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: boris.ostrovsky@xxxxxxxxxx; konrad.wilk@xxxxxxxxxx; Roger Pau Monne
> <roger.pau@xxxxxxxxxx>; Ian Jackson <Ian.Jackson@xxxxxxxxxx>; Wei Liu
> <wei.liu2@xxxxxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>; Andrew Cooper
> <Andrew.Cooper3@xxxxxxxxxx>; Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> Subject: [PATCH v5 02/11] vpci: introduce basic handlers to trap accesses to
> the PCI config space
> 
> This functionality is going to reside in vpci.c (and the corresponding
> vpci.h header), and should be arch-agnostic. The handlers introduced
> in this patch setup the basic functionality required in order to trap
> accesses to the PCI config space, and allow decoding the address and
> finding the corresponding handler that should handle the access
> (although no handlers are implemented).
> 
> Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are
> setup inside of a x86 HVM file, since that's not shared with other
> arches.
> 
> A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen
> whether a domain should use the newly introduced vPCI handlers, this
> is only enabled for PVH Dom0 at the moment.
> 
> A very simple user-space test is also provided, so that the basic
> functionality of the vPCI traps can be asserted. This has been proven
> quite helpful during development, since the logic to handle partial
> accesses or accesses that expand across multiple registers is not
> trivial.
> 
> The handlers for the registers are added to a linked list that's keep
> sorted at all times. Both the read and write handlers support accesses
> that expand across multiple emulated registers and contain gaps not
> emulated.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> ---
> Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
> Cc: Wei Liu <wei.liu2@xxxxxxxxxx>
> Cc: Jan Beulich <jbeulich@xxxxxxxx>
> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Cc: Paul Durrant <paul.durrant@xxxxxxxxxx>
> ---
> Changes since v4:
> * User-space test harness:
>  - Do not redirect the output of the test.
>  - Add main.c and emul.h as dependencies of the Makefile target.
>  - Use the same rule to modify the vpci and list headers.
>  - Remove underscores from local macro variables.
>  - Add _check suffix to the test harness multiread function.
>  - Change the value written by every different size in the multiwrite
>    test.
>  - Use { } to initialize the r16 and r20 arrays (instead of { 0 }).
>  - Perform some of the read checks with the local variable directly.
>  - Expand some comments.
>  - Implement a dummy rwlock.
> * Hypervisor code:
>  - Guard the linker script changes with CONFIG_HAS_PCI.
>  - Rename vpci_access_check to vpci_access_allowed and make it return
>    bool.
>  - Make hvm_pci_decode_addr return the register as return value.
>  - Use ~3 instead of 0xfffc to remove the register offset when
>    checking accesses to IO ports.
>  - s/head/prev in vpci_add_register.
>  - Add parentheses around & in vpci_add_register.
>  - Fix register removal.
>  - Change the BUGs in vpci_{read/write}_hw helpers to
>    ASSERT_UNREACHABLE.
>  - Make merge_result static and change the computation of the mask to
>    avoid using a uint64_t.
>  - Modify vpci_read to only read from hardware the not-emulated gaps.
>  - Remove the vpci_val union and use a uint32_t instead.
>  - Change handler read type to return a uint32_t instead of modifying
>    a variable passed by reference.
>  - Constify the data opaque parameter of read handlers.
>  - Change the size parameter of the vpci_{read/write} functions to
>    unsigned int.
>  - Place the array of initialization handlers in init.rodata or
>    .rodata depending on whether late-hwdom is enabled.
>  - Remove the pci_devs lock, assume the Dom0 is well behaved and won't
>    remove the device while trying to access it.
>  - Change the recursive spinlock into a rw lock for performance
>    reasons.
> 
> Changes since v3:
> * User-space test harness:
>  - Fix spaces in container_of macro.
>  - Implement a dummy locking functions.
>  - Remove 'current' macro make current a pointer to the statically
>    allocated vpcu.
>  - Remove unneeded parentheses in the pci_conf_readX macros.
>  - Fix the name of the write test macro.
>  - Remove the dummy EXPORT_SYMBOL macro (this was needed by the RB
>    code only).
>  - Import the max macro.
>  - Test all possible read/write size combinations with all possible
>    emulated register sizes.
>  - Introduce a test for register removal.
> * Hypervisor code:
>  - Use a sorted list in order to store the config space handlers.
>  - Remove some unneeded 'else' branches.
>  - Make the IO port handlers always return X86EMUL_OKAY, and set the
>    data to all 1's in case of read failure (write are simply ignored).
>  - In hvm_select_ioreq_server reuse local variables when calling
>    XEN_DMOP_PCI_SBDF.
>  - Store the pointers to the initialization functions in the .rodata
>    section.
>  - Do not ignore the return value of xen_vpci_add_handlers in
>    setup_one_hwdom_device.
>  - Remove the vpci_init macro.
>  - Do not hide the pointers inside of the vpci_{read/write}_t
>    typedefs.
>  - Rename priv_data to private in vpci_register.
>  - Simplify checking for register overlap in vpci_register_cmp.
>  - Check that the offset and the length match before removing a
>    register in xen_vpci_remove_register.
>  - Make vpci_read_hw return a value rather than storing it in a
>    pointer passed by parameter.
>  - Handler dispatcher functions vpci_{read/write} no longer return an
>    error code, errors on reads/writes should be treated like hardware
>    (writes ignored, reads return all 1's or garbage).
>  - Make sure pcidevs is locked before calling pci_get_pdev_by_domain.
>  - Use a recursive spinlock for the vpci lock, so that spin_is_locked
>    checks that the current CPU is holding the lock.
>  - Make the code less error-chatty by removing some of the printk's.
>  - Pass the slot and the function as separate parameters to the
>    handler dispatchers (instead of passing devfn).
>  - Allow handlers to be registered with either a read or write
>    function only, the missing handler will be replaced by a dummy
>    handler (writes ignored, reads return 1's).
>  - Introduce PCI_CFG_SPACE_* defines from Linux.
>  - Simplify the handler dispatchers by removing the recursion, now the
>    dispatchers iterate over the list of sorted handlers and call them
>    in order.
>  - Remove the GENMASK_BYTES, SHIFT_RIGHT_BYTES and ADD_RESULT
> macros,
>    and instead provide a merge_result function in order to merge a
>    register output into a partial result.
>  - Rename the fields of the vpci_val union to u8/u16/u32.
>  - Remove the return values from the read/write handlers, errors
>    should be handled internally and signaled as would be done on
>    native hardware.
>  - Remove the usage of the GENMASK macro.
> 
> Changes since v2:
>  - Generalize the PCI address decoding and use it for IOREQ code also.
> 
> Changes since v1:
>  - Allow access to cross a word-boundary.
>  - Add locking.
>  - Add cleanup to xen_vpci_add_handlers in case of failure.
> ---
>  .gitignore                        |   3 +
>  tools/libxl/libxl_x86.c           |   2 +-
>  tools/tests/Makefile              |   1 +
>  tools/tests/vpci/Makefile         |  37 ++++
>  tools/tests/vpci/emul.h           | 128 +++++++++++
>  tools/tests/vpci/main.c           | 314 +++++++++++++++++++++++++++
>  xen/arch/arm/xen.lds.S            |  10 +
>  xen/arch/x86/domain.c             |  18 +-
>  xen/arch/x86/hvm/hvm.c            |   2 +
>  xen/arch/x86/hvm/io.c             | 118 +++++++++-
>  xen/arch/x86/setup.c              |   3 +-
>  xen/arch/x86/xen.lds.S            |  10 +
>  xen/drivers/Makefile              |   2 +-
>  xen/drivers/passthrough/pci.c     |   9 +-
>  xen/drivers/vpci/Makefile         |   1 +
>  xen/drivers/vpci/vpci.c           | 443
> ++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/domain.h      |   1 +
>  xen/include/asm-x86/hvm/domain.h  |   3 +
>  xen/include/asm-x86/hvm/io.h      |   3 +
>  xen/include/public/arch-x86/xen.h |   5 +-
>  xen/include/xen/pci.h             |   3 +
>  xen/include/xen/pci_regs.h        |   8 +
>  xen/include/xen/vpci.h            |  80 +++++++
>  23 files changed, 1194 insertions(+), 10 deletions(-)
>  create mode 100644 tools/tests/vpci/Makefile
>  create mode 100644 tools/tests/vpci/emul.h
>  create mode 100644 tools/tests/vpci/main.c
>  create mode 100644 xen/drivers/vpci/Makefile
>  create mode 100644 xen/drivers/vpci/vpci.c
>  create mode 100644 xen/include/xen/vpci.h
>
[snip]
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 6cb903def5..cc73df8dc7 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -36,6 +36,7 @@
>  #include <xen/rangeset.h>
>  #include <xen/monitor.h>
>  #include <xen/warning.h>
> +#include <xen/vpci.h>
>  #include <asm/shadow.h>
>  #include <asm/hap.h>
>  #include <asm/current.h>
> @@ -629,6 +630,7 @@ int hvm_domain_initialise(struct domain *d, unsigned
> long domcr_flags,
>          d->arch.hvm_domain.io_bitmap = hvm_io_bitmap;
> 
>      register_g2m_portio_handler(d);
> +    register_vpci_portio_handler(d);
> 
>      hvm_ioreq_init(d);
> 
> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> index 074cba89da..c3b68eb257 100644
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -25,6 +25,7 @@
>  #include <xen/trace.h>
>  #include <xen/event.h>
>  #include <xen/hypercall.h>
> +#include <xen/vpci.h>
>  #include <asm/current.h>
>  #include <asm/cpufeature.h>
>  #include <asm/processor.h>
> @@ -260,7 +261,7 @@ unsigned int hvm_pci_decode_addr(unsigned int cf8,
> unsigned int addr,
>                                   unsigned int *bus, unsigned int *slot,
>                                   unsigned int *func)
>  {
> -    unsigned long bdf;
> +    unsigned int bdf;

Shouldn't this be folded into the previous patch where you introduce this 
function?

> 
>      ASSERT(CF8_ENABLED(cf8));
> 
> @@ -275,6 +276,121 @@ unsigned int hvm_pci_decode_addr(unsigned int
> cf8, unsigned int addr,
>      return CF8_ADDR_LO(cf8) | (addr & 3);
>  }
> 
> +/* Do some sanity checks. */
> +static bool vpci_access_allowed(unsigned int reg, unsigned int len)
> +{
> +    /* Check access size. */
> +    if ( len != 1 && len != 2 && len != 4 )
> +        return false;
> +
> +    /* Check that access is size aligned. */
> +    if ( (reg & (len - 1)) )
> +        return false;
> +
> +    return true;
> +}
> +
> +/* vPCI config space IO ports handlers (0xcf8/0xcfc). */
> +static bool vpci_portio_accept(const struct hvm_io_handler *handler,
> +                               const ioreq_t *p)
> +{
> +    return (p->addr == 0xcf8 && p->size == 4) || (p->addr & ~3) == 0xcfc;
> +}
> +
> +static int vpci_portio_read(const struct hvm_io_handler *handler,
> +                            uint64_t addr, uint32_t size, uint64_t *data)
> +{
> +    struct domain *d = current->domain;
> +    unsigned int bus, slot, func, reg;
> +
> +    *data = ~(uint64_t)0;
> +
> +    vpci_rlock(d);
> +    if ( addr == 0xcf8 )
> +    {
> +        ASSERT(size == 4);
> +        *data = d->arch.hvm_domain.pci_cf8;
> +        vpci_runlock(d);
> +        return X86EMUL_OKAY;
> +    }
> +    if ( !CF8_ENABLED(d->arch.hvm_domain.pci_cf8) )
> +    {
> +        vpci_runlock(d);
> +        return X86EMUL_OKAY;
> +    }
> +
> +    reg = hvm_pci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus,
> &slot,
> +                              &func);
> +
> +    if ( !vpci_access_allowed(reg, size) )
> +    {
> +        vpci_runlock(d);
> +        return X86EMUL_OKAY;
> +    }
> +
> +    *data = vpci_read(0, bus, slot, func, reg, size);
> +    vpci_runlock(d);
> +
> +    return X86EMUL_OKAY;
> +}
> +
> +static int vpci_portio_write(const struct hvm_io_handler *handler,
> +                             uint64_t addr, uint32_t size, uint64_t data)
> +{
> +    struct domain *d = current->domain;
> +    unsigned int bus, slot, func, reg;
> +
> +    vpci_wlock(d);
> +    if ( addr == 0xcf8 )
> +    {
> +        ASSERT(size == 4);
> +        d->arch.hvm_domain.pci_cf8 = data;
> +        vpci_wunlock(d);
> +        return X86EMUL_OKAY;
> +    }
> +    if ( !CF8_ENABLED(d->arch.hvm_domain.pci_cf8) )
> +    {
> +        vpci_wunlock(d);
> +        return X86EMUL_OKAY;
> +    }
> +
> +    reg = hvm_pci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus,
> &slot,
> +                              &func);
> +
> +    if ( !vpci_access_allowed(reg, size) )
> +    {
> +        vpci_wunlock(d);
> +        return X86EMUL_OKAY;
> +    }
> +
> +    vpci_write(0, bus, slot, func, reg, size, data);
> +    vpci_wunlock(d);
> +
> +    return X86EMUL_OKAY;
> +}
> +
> +static const struct hvm_io_ops vpci_portio_ops = {
> +    .accept = vpci_portio_accept,
> +    .read = vpci_portio_read,
> +    .write = vpci_portio_write,
> +};
> +
> +void register_vpci_portio_handler(struct domain *d)
> +{
> +    struct hvm_io_handler *handler;
> +
> +    if ( !has_vpci(d) )
> +        return;
> +
> +    handler = hvm_next_io_handler(d);
> +    if ( !handler )
> +        return;
> +
> +    rwlock_init(&d->arch.hvm_domain.vpci_lock);
> +    handler->type = IOREQ_TYPE_PIO;
> +    handler->ops = &vpci_portio_ops;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index db5df6956d..5b2c0e3fc3 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1566,7 +1566,8 @@ void __init noreturn __start_xen(unsigned long
> mbi_p)
>          domcr_flags |= DOMCRF_hvm |
>                         ((hvm_funcs.hap_supported && !opt_dom0_shadow) ?
>                           DOMCRF_hap : 0);
> -        config.emulation_flags =
> XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC;
> +        config.emulation_flags =
> XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC|
> +                                 XEN_X86_EMU_VPCI;
>      }
> 
>      /* Create initial domain 0. */
> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
> index ff08bbe42a..af1b30cb2b 100644
> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -76,6 +76,11 @@ SECTIONS
> 
>    __2M_rodata_start = .;       /* Start of 2M superpages, mapped RO. */
>    .rodata : {
> +#if defined(CONFIG_HAS_PCI) && defined(CONFIG_LATE_HWDOM)
> +       __start_vpci_array = .;
> +       *(.rodata.vpci)
> +       __end_vpci_array = .;
> +#endif
>         _srodata = .;
>         /* Bug frames table */
>         __start_bug_frames = .;
> @@ -167,6 +172,11 @@ SECTIONS
>         _einittext = .;
>    } :text
>    .init.data : {
> +#if defined(CONFIG_HAS_PCI) && !defined(CONFIG_LATE_HWDOM)
> +       __start_vpci_array = .;
> +       *(.init.rodata.vpci)
> +       __end_vpci_array = .;
> +#endif
>         *(.init.rodata)
>         *(.init.rodata.rel)
>         *(.init.rodata.str*)
> diff --git a/xen/drivers/Makefile b/xen/drivers/Makefile
> index 19391802a8..d51c766453 100644
> --- a/xen/drivers/Makefile
> +++ b/xen/drivers/Makefile
> @@ -1,6 +1,6 @@
>  subdir-y += char
>  subdir-$(CONFIG_HAS_CPUFREQ) += cpufreq
> -subdir-$(CONFIG_HAS_PCI) += pci
> +subdir-$(CONFIG_HAS_PCI) += pci vpci
>  subdir-$(CONFIG_HAS_PASSTHROUGH) += passthrough
>  subdir-$(CONFIG_ACPI) += acpi
>  subdir-$(CONFIG_VIDEO) += video
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 27bdb7163c..54326cf0b8 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -30,6 +30,7 @@
>  #include <xen/radix-tree.h>
>  #include <xen/softirq.h>
>  #include <xen/tasklet.h>
> +#include <xen/vpci.h>
>  #include <xsm/xsm.h>
>  #include <asm/msi.h>
>  #include "ats.h"
> @@ -1030,9 +1031,10 @@ static void __hwdom_init
> setup_one_hwdom_device(const struct setup_hwdom *ctxt,
>                                                  struct pci_dev *pdev)
>  {
>      u8 devfn = pdev->devfn;
> +    int err;
> 
>      do {
> -        int err = ctxt->handler(devfn, pdev);
> +        err = ctxt->handler(devfn, pdev);
> 
>          if ( err )
>          {
> @@ -1045,6 +1047,11 @@ static void __hwdom_init
> setup_one_hwdom_device(const struct setup_hwdom *ctxt,
>          devfn += pdev->phantom_stride;
>      } while ( devfn != pdev->devfn &&
>                PCI_SLOT(devfn) == PCI_SLOT(pdev->devfn) );
> +
> +    err = vpci_add_handlers(pdev);
> +    if ( err )
> +        printk(XENLOG_ERR "setup of vPCI for d%d failed: %d\n",
> +               ctxt->d->domain_id, err);
>  }
> 
>  static int __hwdom_init _setup_hwdom_pci_devices(struct pci_seg *pseg,
> void *arg)
> diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
> new file mode 100644
> index 0000000000..840a906470
> --- /dev/null
> +++ b/xen/drivers/vpci/Makefile
> @@ -0,0 +1 @@
> +obj-y += vpci.o
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> new file mode 100644
> index 0000000000..f63de97e89
> --- /dev/null
> +++ b/xen/drivers/vpci/vpci.c
> @@ -0,0 +1,443 @@
> +/*
> + * Generic functionality for handling accesses to the PCI configuration space
> + * from guests.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/sched.h>
> +#include <xen/vpci.h>
> +
> +extern vpci_register_init_t *const __start_vpci_array[];
> +extern vpci_register_init_t *const __end_vpci_array[];
> +#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
> +
> +/* Internal struct to store the emulated PCI registers. */
> +struct vpci_register {
> +    vpci_read_t *read;
> +    vpci_write_t *write;
> +    unsigned int size;
> +    unsigned int offset;
> +    void *private;
> +    struct list_head node;
> +};
> +
> +int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
> +{
> +    unsigned int i;
> +    int rc = 0;
> +
> +    if ( !has_vpci(pdev->domain) )
> +        return 0;
> +
> +    pdev->vpci = xzalloc(struct vpci);
> +    if ( !pdev->vpci )
> +        return -ENOMEM;
> +
> +    INIT_LIST_HEAD(&pdev->vpci->handlers);
> +
> +    for ( i = 0; i < NUM_VPCI_INIT; i++ )
> +    {
> +        rc = __start_vpci_array[i](pdev);
> +        if ( rc )
> +            break;
> +    }
> +
> +    if ( rc )
> +    {
> +        while ( !list_empty(&pdev->vpci->handlers) )
> +        {
> +            struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
> +                                                       struct vpci_register,
> +                                                       node);
> +
> +            list_del(&r->node);
> +            xfree(r);
> +        }
> +        xfree(pdev->vpci);
> +    }
> +
> +    return rc;
> +}
> +
> +static int vpci_register_cmp(const struct vpci_register *r1,
> +                             const struct vpci_register *r2)
> +{
> +    /* Return 0 if registers overlap. */
> +    if ( r1->offset < r2->offset + r2->size &&
> +         r2->offset < r1->offset + r1->size )
> +        return 0;
> +    if ( r1->offset < r2->offset )
> +        return -1;
> +    if ( r1->offset > r2->offset )
> +        return 1;
> +
> +    ASSERT_UNREACHABLE();
> +    return 0;
> +}
> +
> +/* Dummy hooks, writes are ignored, reads return 1's */
> +static uint32_t vpci_ignored_read(struct pci_dev *pdev, unsigned int reg,
> +                                  const void *data)
> +{
> +    return ~(uint32_t)0;
> +}
> +
> +static void vpci_ignored_write(struct pci_dev *pdev, unsigned int reg,
> +                               uint32_t val, void *data)
> +{
> +}
> +
> +int vpci_add_register(const struct pci_dev *pdev, vpci_read_t
> *read_handler,
> +                      vpci_write_t *write_handler, unsigned int offset,
> +                      unsigned int size, void *data)
> +{
> +    struct list_head *prev;
> +    struct vpci_register *r;
> +
> +    /* Some sanity checks. */
> +    if ( (size != 1 && size != 2 && size != 4) ||
> +         offset >= PCI_CFG_SPACE_EXP_SIZE || (offset & (size - 1)) ||
> +         (!read_handler && !write_handler) )
> +        return -EINVAL;
> +
> +    r = xmalloc(struct vpci_register);
> +    if ( !r )
> +        return -ENOMEM;
> +
> +    r->read = read_handler ?: vpci_ignored_read;
> +    r->write = write_handler ?: vpci_ignored_write;
> +    r->size = size;
> +    r->offset = offset;
> +    r->private = data;
> +
> +    vpci_wlock(pdev->domain);
> +
> +    /* The list of handlers must be keep sorted at all times. */
> +    list_for_each ( prev, &pdev->vpci->handlers )
> +    {
> +        const struct vpci_register *this =
> +            list_entry(prev, const struct vpci_register, node);
> +        int cmp = vpci_register_cmp(r, this);
> +
> +        if ( cmp < 0 )
> +            break;
> +        if ( cmp == 0 )
> +        {
> +            vpci_wunlock(pdev->domain);
> +            xfree(r);
> +            return -EEXIST;
> +        }
> +    }
> +
> +    list_add_tail(&r->node, prev);
> +    vpci_wunlock(pdev->domain);
> +
> +    return 0;
> +}
> +
> +int vpci_remove_register(const struct pci_dev *pdev, unsigned int offset,
> +                         unsigned int size)
> +{
> +    const struct vpci_register r = { .offset = offset, .size = size };
> +    struct vpci_register *rm;
> +
> +    vpci_wlock(pdev->domain);
> +    list_for_each_entry ( rm, &pdev->vpci->handlers, node )
> +    {
> +        int cmp = vpci_register_cmp(&r, rm);
> +
> +        /*
> +         * NB: do not use a switch so that we can use break to
> +         * get out of the list loop earlier if required.
> +         */
> +        if ( !cmp && rm->offset == offset && rm->size == size )
> +        {
> +            list_del(&rm->node);
> +            vpci_wunlock(pdev->domain);
> +            xfree(rm);
> +            return 0;
> +        }
> +        if ( cmp <= 0 )
> +            break;
> +    }
> +    vpci_wunlock(pdev->domain);
> +
> +    return -ENOENT;
> +}
> +
> +/* Wrappers for performing reads/writes to the underlying hardware. */
> +static uint32_t vpci_read_hw(unsigned int seg, unsigned int bus,
> +                             unsigned int slot, unsigned int func,
> +                             unsigned int reg, unsigned int size)
> +{
> +    uint32_t data;
> +
> +    switch ( size )
> +    {
> +    case 4:
> +        data = pci_conf_read32(seg, bus, slot, func, reg);
> +        break;
> +    case 3:
> +        /*
> +         * This is possible because a 4byte read can have 1byte trapped and
> +         * the rest passed-through.
> +         */
> +        if ( reg & 1 )
> +        {
> +            data = pci_conf_read8(seg, bus, slot, func, reg);
> +            data |= pci_conf_read16(seg, bus, slot, func, reg + 1) << 8;
> +        }
> +        else
> +        {
> +            data = pci_conf_read16(seg, bus, slot, func, reg);
> +            data |= pci_conf_read8(seg, bus, slot, func, reg + 2) << 16;
> +        }
> +        break;
> +    case 2:
> +        data = pci_conf_read16(seg, bus, slot, func, reg);
> +        break;
> +    case 1:
> +        data = pci_conf_read8(seg, bus, slot, func, reg);
> +        break;
> +    default:
> +        ASSERT_UNREACHABLE();
> +        data = ~(uint32_t)0;
> +        break;
> +    }
> +
> +    return data;
> +}
> +
> +static void vpci_write_hw(unsigned int seg, unsigned int bus,
> +                          unsigned int slot, unsigned int func,
> +                          unsigned int reg, unsigned int size, uint32_t data)
> +{
> +    switch ( size )
> +    {
> +    case 4:
> +        pci_conf_write32(seg, bus, slot, func, reg, data);
> +        break;
> +    case 3:
> +        /*
> +         * This is possible because a 4byte write can have 1byte trapped and
> +         * the rest passed-through.
> +         */
> +        if ( reg & 1 )
> +        {
> +            pci_conf_write8(seg, bus, slot, func, reg, data);
> +            pci_conf_write16(seg, bus, slot, func, reg + 1, data >> 8);
> +        }
> +        else
> +        {
> +            pci_conf_write16(seg, bus, slot, func, reg, data);
> +            pci_conf_write8(seg, bus, slot, func, reg + 2, data >> 16);
> +        }
> +        break;
> +    case 2:
> +        pci_conf_write16(seg, bus, slot, func, reg, data);
> +        break;
> +    case 1:
> +        pci_conf_write8(seg, bus, slot, func, reg, data);
> +        break;
> +    default:
> +        ASSERT_UNREACHABLE();
> +        break;
> +    }
> +}
> +
> +/*
> + * Merge new data into a partial result.
> + *
> + * Zero the bytes of 'data' from [offset, offset + size), and
> + * merge the value found in 'new' from [0, offset) left shifted
> + * by 'offset'.
> + */
> +static uint32_t merge_result(uint32_t data, uint32_t new, unsigned int size,
> +                             unsigned int offset)
> +{
> +    uint32_t mask = 0xffffffff >> (32 - 8 * size);
> +
> +    return (data & ~(mask << (offset * 8))) | ((new & mask) << (offset * 8));
> +}
> +
> +uint32_t vpci_read(unsigned int seg, unsigned int bus, unsigned int slot,
> +                   unsigned int func, unsigned int reg, unsigned int size)
> +{
> +    struct domain *d = current->domain;
> +    struct pci_dev *pdev;
> +    const struct vpci_register *r;
> +    unsigned int data_offset = 0;
> +    uint32_t data = ~(uint32_t)0;
> +
> +    ASSERT(vpci_rlocked(d));
> +
> +    /* Find the PCI dev matching the address. */
> +    pdev = pci_get_pdev_by_domain(d, seg, bus, PCI_DEVFN(slot, func));
> +    if ( !pdev )
> +        return vpci_read_hw(seg, bus, slot, func, reg, size);
> +
> +    /* Read from the hardware or the emulated register handlers. */
> +    list_for_each_entry ( r, &pdev->vpci->handlers, node )
> +    {
> +        const struct vpci_register emu = {
> +            .offset = reg + data_offset,
> +            .size = size - data_offset
> +        };
> +        int cmp = vpci_register_cmp(&emu, r);
> +        uint32_t val;
> +        unsigned int read_size;
> +
> +        if ( cmp < 0 )
> +            break;
> +        if ( cmp > 0 )
> +            continue;
> +
> +        if ( emu.offset < r->offset )
> +        {
> +            /* Heading gap, read partial content from hardware. */
> +            read_size = r->offset - emu.offset;
> +            val = vpci_read_hw(seg, bus, slot, func, emu.offset, read_size);
> +            data = merge_result(data, val, read_size, data_offset);
> +            data_offset += read_size;
> +        }
> +
> +        val = r->read(pdev, r->offset, r->private);
> +
> +        /* Check if the read is in the middle of a register. */
> +        if ( r->offset < emu.offset )
> +            val >>= (emu.offset - r->offset) * 8;
> +
> +        /* Find the intersection size between the two sets. */
> +        read_size = min(emu.offset + emu.size, r->offset + r->size) -
> +                    max(emu.offset, r->offset);
> +        /* Merge the emulated data into the native read value. */
> +        data = merge_result(data, val, read_size, data_offset);
> +        data_offset += read_size;
> +        if ( data_offset == size )
> +            break;
> +        ASSERT(data_offset < size);
> +    }
> +
> +    if ( data_offset < size )
> +    {
> +        /* Tailing gap, read the remaining. */
> +        uint32_t tmp_data = vpci_read_hw(seg, bus, slot, func,
> +                                         reg + data_offset,
> +                                         size - data_offset);
> +
> +        data = merge_result(data, tmp_data, size - data_offset, data_offset);
> +    }
> +
> +    return data & (0xffffffff >> (32 - 8 * size));
> +}
> +
> +/*
> + * Perform a maybe partial write to a register.
> + *
> + * Note that this will only work for simple registers, if Xen needs to
> + * trap accesses to rw1c registers (like the status PCI header register)
> + * the logic in vpci_write will have to be expanded in order to correctly
> + * deal with them.
> + */
> +static void vpci_write_helper(struct pci_dev *pdev,
> +                              const struct vpci_register *r, unsigned int 
> size,
> +                              unsigned int offset, uint32_t data)
> +{
> +    ASSERT(size <= r->size);
> +
> +    if ( size != r->size )
> +    {
> +        uint32_t val;
> +
> +        val = r->read(pdev, r->offset, r->private);
> +        data = merge_result(val, data, size, offset);
> +    }
> +
> +    r->write(pdev, r->offset, data & (0xffffffff >> (32 - 8 * r->size)),
> +             r->private);
> +}
> +
> +void vpci_write(unsigned int seg, unsigned int bus, unsigned int slot,
> +                unsigned int func, unsigned int reg, unsigned int size,
> +                uint32_t data)
> +{
> +    struct domain *d = current->domain;
> +    struct pci_dev *pdev;
> +    const struct vpci_register *r;
> +    unsigned int data_offset = 0;
> +
> +    ASSERT(vpci_wlocked(d));
> +
> +    /*
> +     * Find the PCI dev matching the address.
> +     * Passthrough everything that's not trapped.
> +     */
> +    pdev = pci_get_pdev_by_domain(d, seg, bus, PCI_DEVFN(slot, func));
> +    if ( !pdev )
> +    {
> +        vpci_write_hw(seg, bus, slot, func, reg, size, data);
> +        return;
> +    }
> +
> +    /* Write the value to the hardware or emulated registers. */
> +    list_for_each_entry ( r, &pdev->vpci->handlers, node )
> +    {
> +        const struct vpci_register emu = {
> +            .offset = reg + data_offset,
> +            .size = size - data_offset
> +        };
> +        int cmp = vpci_register_cmp(&emu, r);
> +        unsigned int write_size;
> +
> +        if ( cmp < 0 )
> +            break;
> +        if ( cmp > 0 )
> +            continue;
> +
> +        if ( emu.offset < r->offset )
> +        {
> +            /* Heading gap, write partial content to hardware. */
> +            vpci_write_hw(seg, bus, slot, func, emu.offset,
> +                          r->offset - emu.offset, data >> (data_offset * 8));
> +            data_offset += r->offset - emu.offset;
> +        }
> +
> +        /* Find the intersection size between the two sets. */
> +        write_size = min(emu.offset + emu.size, r->offset + r->size) -
> +                     max(emu.offset, r->offset);
> +        vpci_write_helper(pdev, r, write_size, reg + data_offset - r->offset,
> +                          data >> (data_offset * 8));
> +        data_offset += write_size;
> +        if ( data_offset == size )
> +            break;
> +        ASSERT(data_offset < size);
> +    }
> +
> +    if ( data_offset < size )
> +        /* Tailing gap, write the remaining. */
> +        vpci_write_hw(seg, bus, slot, func, reg + data_offset,
> +                      size - data_offset, data >> (data_offset * 8));
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index c10522b7f5..ec14343b27 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -427,6 +427,7 @@ struct arch_domain
>  #define has_vpit(d)        (!!((d)->arch.emulation_flags &
> XEN_X86_EMU_PIT))
>  #define has_pirq(d)        (!!((d)->arch.emulation_flags & \
>                              XEN_X86_EMU_USE_PIRQ))
> +#define has_vpci(d)        (!!((d)->arch.emulation_flags &
> XEN_X86_EMU_VPCI))
> 
>  #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
> 
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-
> x86/hvm/domain.h
> index d2899c9bb2..3a54d50606 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -184,6 +184,9 @@ struct hvm_domain {
>      /* List of guest to machine IO ports mapping. */
>      struct list_head g2m_ioport_list;
> 
> +    /* Lock for the PCI emulation layer (vPCI). */
> +    rwlock_t vpci_lock;
> +
>      /* List of permanently write-mapped pages. */
>      struct {
>          spinlock_t lock;
> diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
> index 51659b6c7f..01322a2e21 100644
> --- a/xen/include/asm-x86/hvm/io.h
> +++ b/xen/include/asm-x86/hvm/io.h
> @@ -160,6 +160,9 @@ unsigned int hvm_pci_decode_addr(unsigned int cf8,
> unsigned int addr,
>   */
>  void register_g2m_portio_handler(struct domain *d);
> 
> +/* HVM port IO handler for PCI accesses. */
> +void register_vpci_portio_handler(struct domain *d);
> +
>  #endif /* __ASM_X86_HVM_IO_H__ */
> 
> 
> diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-
> x86/xen.h
> index f21332e897..86a1a09a8d 100644
> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -295,12 +295,15 @@ struct xen_arch_domainconfig {
>  #define XEN_X86_EMU_PIT             (1U<<_XEN_X86_EMU_PIT)
>  #define _XEN_X86_EMU_USE_PIRQ       9
>  #define XEN_X86_EMU_USE_PIRQ        (1U<<_XEN_X86_EMU_USE_PIRQ)
> +#define _XEN_X86_EMU_VPCI           10
> +#define XEN_X86_EMU_VPCI            (1U<<_XEN_X86_EMU_VPCI)
> 
>  #define XEN_X86_EMU_ALL             (XEN_X86_EMU_LAPIC |
> XEN_X86_EMU_HPET |  \
>                                       XEN_X86_EMU_PM | XEN_X86_EMU_RTC |      
> \
>                                       XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |  
> \
>                                       XEN_X86_EMU_VGA | XEN_X86_EMU_IOMMU |   
> \
> -                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ)
> +                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ 
> |\
> +                                     XEN_X86_EMU_VPCI)
>      uint32_t emulation_flags;
>  };
> 
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index ea6a66b248..ad5d3ca031 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -88,6 +88,9 @@ struct pci_dev {
>  #define PT_FAULT_THRESHOLD 10
>      } fault;
>      u64 vf_rlen[6];
> +
> +    /* Data for vPCI. */
> +    struct vpci *vpci;
>  };
> 
>  #define for_each_pdev(domain, pdev) \
> diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
> index ecd6124d91..cc4ee3b83e 100644
> --- a/xen/include/xen/pci_regs.h
> +++ b/xen/include/xen/pci_regs.h
> @@ -23,6 +23,14 @@
>  #define LINUX_PCI_REGS_H
> 
>  /*
> + * Conventional PCI and PCI-X Mode 1 devices have 256 bytes of
> + * configuration space.  PCI-X Mode 2 and PCIe devices have 4096 bytes of
> + * configuration space.
> + */
> +#define PCI_CFG_SPACE_SIZE   256
> +#define PCI_CFG_SPACE_EXP_SIZE       4096
> +
> +/*
>   * Under PCI, each device has 256 bytes of configuration address space,
>   * of which the first 64 bytes are standardized as follows:
>   */
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> new file mode 100644
> index 0000000000..12f7287d7b
> --- /dev/null
> +++ b/xen/include/xen/vpci.h
> @@ -0,0 +1,80 @@
> +#ifndef _VPCI_
> +#define _VPCI_
> +
> +#include <xen/pci.h>
> +#include <xen/types.h>
> +#include <xen/list.h>
> +
> +/*
> + * Helpers for locking/unlocking.
> + *
> + * NB: the recursive variants are used so that spin_is_locked
> + * returns whether the lock is hold by the current CPU (instead
> + * of just returning whether the lock is hold by any CPU).
> + */

The comment doesn't seem to match the use of read-write locks below.

> +#define vpci_rlock(d) read_lock(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_wlock(d) write_lock(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_runlock(d) read_unlock(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_wunlock(d) write_unlock(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_rlocked(d) rw_is_locked(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_wlocked(d) rw_is_write_locked(&(d)-
> >arch.hvm_domain.vpci_lock)
> +
> +/*
> + * The vPCI handlers will never be called concurrently for the same domain,
> it
> + * is guaranteed that the vpci domain lock will always be locked when calling
> + * any handler.
> + */
> +typedef uint32_t vpci_read_t(struct pci_dev *pdev, unsigned int reg,
> +                             const void *data);
> +
> +typedef void vpci_write_t(struct pci_dev *pdev, unsigned int reg,
> +                          uint32_t val, void *data);
> +
> +typedef int vpci_register_init_t(struct pci_dev *dev);
> +
> +#ifdef CONFIG_LATE_HWDOM
> +#define VPCI_SECTION ".rodata.vpci"
> +#else
> +#define VPCI_SECTION ".init.rodata.vpci"
> +#endif
> +
> +#define REGISTER_VPCI_INIT(x)                   \
> +  static vpci_register_init_t *const x##_entry  \
> +               __used_section(VPCI_SECTION) = x
> +
> +/* Add vPCI handlers to device. */
> +int __must_check vpci_add_handlers(struct pci_dev *dev);
> +
> +/* Add/remove a register handler. */
> +int __must_check vpci_add_register(const struct pci_dev *pdev,
> +                                   vpci_read_t *read_handler,
> +                                   vpci_write_t *write_handler,
> +                                   unsigned int offset, unsigned int size,
> +                                   void *data);
> +int __must_check vpci_remove_register(const struct pci_dev *pdev,
> +                                      unsigned int offset,
> +                                      unsigned int size);
> +
> +/* Generic read/write handlers for the PCI config space. */
> +uint32_t vpci_read(unsigned int seg, unsigned int bus, unsigned int slot,
> +                   unsigned int func, unsigned int reg, unsigned int size);
> +void vpci_write(unsigned int seg, unsigned int bus, unsigned int slot,
> +                unsigned int func, unsigned int reg, unsigned int size,
> +                uint32_t data);
> +
> +struct vpci {
> +    /* List of vPCI handlers for a device. */
> +    struct list_head handlers;
> +};
> +
> +#endif
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> --
> 2.11.0 (Apple Git-81)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.