[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v11 07/12] vpci: add header handlers



> -----Original Message-----
> From: Roger Pau Monne [mailto:roger.pau@xxxxxxxxxx]
> Sent: 20 March 2018 15:16
> To: xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>; Konrad Rzeszutek Wilk
> <konrad.wilk@xxxxxxxxxx>; Roger Pau Monne <roger.pau@xxxxxxxxxx>; Ian
> Jackson <Ian.Jackson@xxxxxxxxxx>; Wei Liu <wei.liu2@xxxxxxxxxx>; Andrew
> Cooper <Andrew.Cooper3@xxxxxxxxxx>; George Dunlap
> <George.Dunlap@xxxxxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>; Julien Grall
> <julien.grall@xxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>; Tim
> (Xen.org) <tim@xxxxxxx>; Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> Subject: [PATCH v11 07/12] vpci: add header handlers
> 
> Introduce a set of handlers that trap accesses to the PCI BARs and the
> command register, in order to snoop BAR sizing and BAR relocation.
> 
> The command handler is used to detect changes to bit 2 (response to
> memory space accesses), and maps/unmaps the BARs of the device into
> the guest p2m. A rangeset is used in order to figure out which memory
> to map/unmap. This makes it easier to keep track of the possible
> overlaps with other BARs, and will also simplify MSI-X support, where
> certain regions of a BAR might be used for the MSI-X table or PBA.
> 
> The BAR register handlers are used to detect attempts by the guest to
> size or relocate the BARs.
> 
> Note that the long running BAR mapping and unmapping operations are
> deferred to be performed by hvm_io_pending, so that they can be safely
> preempted.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>

ioreq part

Reviewed-by: Paul Durrant <paul.durrant@xxxxxxxxxx>

> ---
> Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
> Cc: Wei Liu <wei.liu2@xxxxxxxxxx>
> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
> Cc: Jan Beulich <jbeulich@xxxxxxxx>
> Cc: Julien Grall <julien.grall@xxxxxxx>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> Cc: Tim Deegan <tim@xxxxxxx>
> Cc: Paul Durrant <paul.durrant@xxxxxxxxxx>
> ---
> Changes since v10:
>  - Fix indirect function call in map_range.
>  - Use rom->addr instead of fetching it from the ROM BAR register in
>    modify_decoding.
>  - Remove ternary operator from modify_decoding.
>  - Simply apply_map to have a single return.
>  - Constify pci_dev parameter of apply_map.
>  - Remove references to maybe_defer_map.
>  - Use pdev (const) or dev (non-const) consistently in modify_bars.
>  - Invert part of the logic in rom_write to remove one indentation
>    level.
>  - Add comments in rom_write to clarify why rom->addr is updated in
>    two different places.
>  - Use lx to print frame numbers in modify_bars.
>  - Add start/end local variables in the first modify_bars loop.
> 
> Changes since v9:
>  - Expand comments to clarify the code.
>  - Rename rom to rom_only in the vpci_cpu struct.
>  - Change definition style of dummy vpci_cpu.
>  - Replace incorrect usage of PFN_UP.
>  - Use system_state in order to check if the mapping functions are
>    being called from Dom0 builder context.
>  - Split the maybe_defer_map into two functions and place the Dom0
>    builder one in the init section.
> 
> Changes since v8:
>  - Do not pretend to support ARM in the map_range function. Explain
>    the required changes in the comment.
>  - Introduce PCI_HEADER_{NORMAL/BRIDGE}_NR_BARS defines.
>  - Rename 'rom' boolean variable to 'rom_only', which is more
>    descriptive of it's meaning.
>  - Introduce vpci_remove_device which removes all handlers for a
>    device.
>  - Simplify error handling when modifying BARs mapping. Any error will
>    cause the device to be unplugged (by calling vpci_remove_device).
>  - Return an error code in modify_bars. Add comments describing why
>    the error is sometimes ignored.
> 
> Changes since v7:
>  - Order includes.
>  - Add newline between switch cases.
>  - Fix typo in comment (hopping).
>  - Wrap ternary conditional in parentheses.
>  - Remove CONFIG_HAS_PCI gueard from sched.h vpci_vcpu usage.
>  - Add comment regarding vpci_vcpu usage.
>  - Move rom_enabled from BAR struct to header.
>  - Do not protect vpci_vcpu with __XEN__ guards.
> 
> Changes since v6:
>  - s/vpci_check_pending/vpci_process_pending/.
>  - Improve error handling in vpci_process_pending.
>  - Add a comment that explains how vpci_check_bar_overlap works.
>  - Add error messages to vpci_modify_bars and vpci_modify_rom.
>  - Introduce vpci_hw_read16/32, in order to passthrough reads to
>    the underlying hw.
>  - Print BAR number on error in vpci_bar_write.
>  - Place the CONFIG_HAS_PCI guards inside the vpci.h header and
>    provide an empty vpci_vcpu structure for the !CONFIG_HAS_PCI case.
>  - Define CONFIG_HAS_PCI in the test harness emul.h header before
>    including vpci.h
>  - Add ARM TODOs and an ARM-specific bodge to vpci_map_range due to
>    the lack of preemption in {un}map_mmio_regions.
>  - Make vpci_maybe_defer_map void.
>  - Set rom_enabled in vpci_init_bars.
>  - Defer enabling/disabling the memory decoding (or the ROM enable
>    bit) until the memory has been mapped/unmapped.
>  - Remove vpci_ prefix from static functions.
>  - Use the same code in order to map the general BARs and the ROM
>    BARs.
>  - Remove the seg/bus local variables and use pdev->{seg,bus} instead.
>  - Convert the bools in the BAR related structs into bool bitfields.
>  - Add the must_check attribute to vpci_process_pending.
>  - Open code check_bar_overlap inside modify_bars, which was it's only
>    user.
> 
> Changes since v5:
>  - Switch to the new handler type.
>  - Use pci_sbdf_t to size the BARs.
>  - Use a single return for vpci_modify_bar.
>  - Do not return an error code from vpci_modify_bars, just log the
>    failure.
>  - Remove the 'sizing' parameter. Instead just let the guest write
>    directly to the BAR, and read the value back. This simplifies the
>    BAR register handlers, specially the read one.
>  - Ignore ROM BAR writes with memory decoding enabled and ROM enabled.
>  - Do not propagate failures to setup the ROM BAR in vpci_init_bars.
>  - Add preemption support to the BAR mapping/unmapping operations.
> 
> Changes since v4:
>  - Expand commit message to mention the reason behind the usage of
>    rangesets.
>  - Fix comment related to the inclusiveness of rangesets.
>  - Fix off-by-one error in the calculation of the end of memory
>    regions.
>  - Store the state of the BAR (mapped/unmapped) in the vpci_bar
>    enabled field, previously was only used by ROMs.
>  - Fix double negation of return code.
>  - Modify vpci_cmd_write so it has a single call to pci_conf_write16.
>  - Print a warning when trying to write to the BAR with memory
>    decoding enabled (and ignore the write).
>  - Remove header_type local variable, it's used only once.
>  - Move the read of the command register.
>  - Restore previous command register value in the exit paths.
>  - Only set address to INVALID_PADDR if the initial BAR value matches
>     ~0 & PCI_BASE_ADDRESS_MEM_MASK.
>  - Don't disable the enabled bit in the expansion ROM register, memory
>    decoding is already disabled and takes precedence.
>  - Don't use INVALID_PADDR, just set the initial BAR address to the
>    value found in the hardware.
>  - Introduce rom_enabled to store the status of the
>    PCI_ROM_ADDRESS_ENABLE bit.
>  - Reorder fields of the structure to prevent holes.
> 
> Changes since v3:
>  - Propagate previous changes: drop xen_ prefix and use u8/u16/u32
>    instead of the previous half_word/word/double_word.
>  - Constify some of the paramerters.
>  - s/VPCI_BAR_MEM/VPCI_BAR_MEM32/.
>  - Simplify the number of fields stored for each BAR, a single address
>    field is stored and contains the address of the BAR both on Xen and
>    in the guest.
>  - Allow the guest to move the BARs around in the physical memory map.
>  - Add support for expansion ROM BARs.
>  - Do not cache the value of the command register.
>  - Remove a label used in vpci_cmd_write.
>  - Fix the calculation of the sizing mask in vpci_bar_write.
>  - Check the memory decode bit in order to decide if a BAR is
>    positioned or not.
>  - Disable memory decoding before sizing the BARs in Xen.
>  - When mapping/unmapping BARs check if there's overlap between BARs,
>    in order to avoid unmapping memory required by another BAR.
>  - Introduce a macro to check whether a BAR is mappable or not.
>  - Add a comment regarding the lack of support for SR-IOV.
>  - Remove the usage of the GENMASK macro.
> 
> Changes since v2:
>  - Detect unset BARs and allow the hardware domain to position them.
> ---
>  tools/tests/vpci/emul.h   |   1 +
>  xen/arch/x86/hvm/ioreq.c  |   4 +
>  xen/drivers/vpci/Makefile |   2 +-
>  xen/drivers/vpci/header.c | 548
> ++++++++++++++++++++++++++++++++++++++++++++++
>  xen/drivers/vpci/vpci.c   |  45 ++--
>  xen/include/xen/sched.h   |   4 +
>  xen/include/xen/vpci.h    |  61 ++++++
>  7 files changed, 651 insertions(+), 14 deletions(-)
>  create mode 100644 xen/drivers/vpci/header.c
> 
> diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
> index fd0317995a..5d47544bf7 100644
> --- a/tools/tests/vpci/emul.h
> +++ b/tools/tests/vpci/emul.h
> @@ -80,6 +80,7 @@ typedef union {
>      };
>  } pci_sbdf_t;
> 
> +#define CONFIG_HAS_VPCI
>  #include "vpci.h"
> 
>  #define __hwdom_init
> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> index 7e66965bcd..90c9e3cd59 100644
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -26,6 +26,7 @@
>  #include <xen/domain.h>
>  #include <xen/event.h>
>  #include <xen/paging.h>
> +#include <xen/vpci.h>
> 
>  #include <asm/hvm/hvm.h>
>  #include <asm/hvm/ioreq.h>
> @@ -48,6 +49,9 @@ bool hvm_io_pending(struct vcpu *v)
>      struct domain *d = v->domain;
>      struct hvm_ioreq_server *s;
> 
> +    if ( has_vpci(d) && vpci_process_pending(v) )
> +        return true;
> +
>      list_for_each_entry ( s,
>                            &d->arch.hvm_domain.ioreq_server.list,
>                            list_entry )
> diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
> index 840a906470..241467212f 100644
> --- a/xen/drivers/vpci/Makefile
> +++ b/xen/drivers/vpci/Makefile
> @@ -1 +1 @@
> -obj-y += vpci.o
> +obj-y += vpci.o header.o
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> new file mode 100644
> index 0000000000..d7c220a452
> --- /dev/null
> +++ b/xen/drivers/vpci/header.c
> @@ -0,0 +1,548 @@
> +/*
> + * Generic functionality for handling accesses to the PCI header from the
> + * configuration space.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/p2m-common.h>
> +#include <xen/sched.h>
> +#include <xen/softirq.h>
> +#include <xen/vpci.h>
> +
> +#include <asm/event.h>
> +
> +#define MAPPABLE_BAR(x)                                                 \
> +    ((x)->type == VPCI_BAR_MEM32 || (x)->type == VPCI_BAR_MEM64_LO
> ||   \
> +     (x)->type == VPCI_BAR_ROM)
> +
> +struct map_data {
> +    struct domain *d;
> +    bool map;
> +};
> +
> +static int map_range(unsigned long s, unsigned long e, void *data,
> +                     unsigned long *c)
> +{
> +    const struct map_data *map = data;
> +    int rc;
> +
> +    for ( ; ; )
> +    {
> +        unsigned long size = e - s + 1;
> +
> +        /*
> +         * ARM TODOs:
> +         * - On ARM whether the memory is prefetchable or not should be
> passed
> +         *   to map_mmio_regions in order to decide which memory attributes
> +         *   should be used.
> +         *
> +         * - {un}map_mmio_regions doesn't support preemption.
> +         */
> +
> +        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
> +                      : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
> +        if ( rc == 0 )
> +        {
> +            *c += size;
> +            break;
> +        }
> +        if ( rc < 0 )
> +        {
> +            printk(XENLOG_G_WARNING
> +                   "Failed to identity %smap [%lx, %lx] for d%d: %d\n",
> +                   map ? "" : "un", s, e, map->d->domain_id, rc);
> +            break;
> +        }
> +        ASSERT(rc < size);
> +        *c += rc;
> +        s += rc;
> +        if ( general_preempt_check() )
> +                return -ERESTART;
> +    }
> +
> +    return rc;
> +}
> +
> +/*
> + * The rom_only parameter is used to signal the map/unmap helpers that
> the ROM
> + * BAR's enable bit has changed with the memory decoding bit already
> enabled.
> + * If rom_only is not set then it's the memory decoding bit that changed.
> + */
> +static void modify_decoding(const struct pci_dev *pdev, bool map, bool
> rom_only)
> +{
> +    struct vpci_header *header = &pdev->vpci->header;
> +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +    uint16_t cmd;
> +    unsigned int i;
> +
> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +    {
> +        if ( !MAPPABLE_BAR(&header->bars[i]) )
> +            continue;
> +
> +        if ( rom_only && header->bars[i].type == VPCI_BAR_ROM )
> +        {
> +            unsigned int rom_pos = (i == PCI_HEADER_NORMAL_NR_BARS)
> +                                   ? PCI_ROM_ADDRESS : PCI_ROM_ADDRESS1;
> +            uint32_t val = header->bars[i].addr |
> +                           (map ? PCI_ROM_ADDRESS_ENABLE : 0);
> +
> +            header->bars[i].enabled = header->rom_enabled = map;
> +            pci_conf_write32(pdev->seg, pdev->bus, slot, func, rom_pos, val);
> +            return;
> +        }
> +
> +        if ( !rom_only &&
> +             (header->bars[i].type != VPCI_BAR_ROM || header->rom_enabled)
> )
> +            header->bars[i].enabled = map;
> +    }
> +
> +    ASSERT(!rom_only);
> +    cmd = pci_conf_read16(pdev->seg, pdev->bus, slot, func,
> PCI_COMMAND);
> +    cmd &= ~PCI_COMMAND_MEMORY;
> +    cmd |= map ? PCI_COMMAND_MEMORY : 0;
> +    pci_conf_write16(pdev->seg, pdev->bus, slot, func, PCI_COMMAND,
> +                     cmd);
> +}
> +
> +bool vpci_process_pending(struct vcpu *v)
> +{
> +    if ( v->vpci.mem )
> +    {
> +        struct map_data data = {
> +            .d = v->domain,
> +            .map = v->vpci.map,
> +        };
> +        int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
> +
> +        if ( rc == -ERESTART )
> +            return true;
> +
> +        spin_lock(&v->vpci.pdev->vpci->lock);
> +        /* Disable memory decoding unconditionally on failure. */
> +        modify_decoding(v->vpci.pdev, !rc && v->vpci.map,
> +                        !rc && v->vpci.rom_only);
> +        spin_unlock(&v->vpci.pdev->vpci->lock);
> +
> +        rangeset_destroy(v->vpci.mem);
> +        v->vpci.mem = NULL;
> +        if ( rc )
> +            /*
> +             * FIXME: in case of failure remove the device from the domain.
> +             * Note that there might still be leftover mappings. While this 
> is
> +             * safe for Dom0, for DomUs the domain will likely need to be
> +             * killed in order to avoid leaking stale p2m mappings on
> +             * failure.
> +             */
> +            vpci_remove_device(v->vpci.pdev);
> +    }
> +
> +    return false;
> +}
> +
> +static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
> +                            struct rangeset *mem)
> +{
> +    struct map_data data = { .d = d, .map = true };
> +    int rc;
> +
> +    while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -
> ERESTART )
> +        process_pending_softirqs();
> +    rangeset_destroy(mem);
> +    if ( !rc )
> +        modify_decoding(pdev, true, false);
> +
> +    return rc;
> +}
> +
> +static void defer_map(struct domain *d, struct pci_dev *pdev,
> +                      struct rangeset *mem, bool map, bool rom_only)
> +{
> +    struct vcpu *curr = current;
> +
> +    /*
> +     * FIXME: when deferring the {un}map the state of the device should not
> +     * be trusted. For example the enable bit is toggled after the device
> +     * is mapped. This can lead to parallel mapping operations being
> +     * started for the same device if the domain is not well-behaved.
> +     */
> +    curr->vpci.pdev = pdev;
> +    curr->vpci.mem = mem;
> +    curr->vpci.map = map;
> +    curr->vpci.rom_only = rom_only;
> +}
> +
> +static int modify_bars(const struct pci_dev *pdev, bool map, bool
> rom_only)
> +{
> +    struct vpci_header *header = &pdev->vpci->header;
> +    struct rangeset *mem = rangeset_new(NULL, NULL, 0);
> +    struct pci_dev *tmp, *dev = NULL;
> +    unsigned int i;
> +    int rc;
> +
> +    if ( !mem )
> +        return -ENOMEM;
> +
> +    /*
> +     * Create a rangeset that represents the current device BARs memory
> region
> +     * and compare it against all the currently active BAR memory regions. If
> +     * an overlap is found, subtract it from the region to be
> mapped/unmapped.
> +     *
> +     * First fill the rangeset with all the BARs of this device or with the 
> ROM
> +     * BAR only, depending on whether the guest is toggling the memory
> decode
> +     * bit of the command register, or the enable bit of the ROM BAR 
> register.
> +     */
> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +    {
> +        const struct vpci_bar *bar = &header->bars[i];
> +        unsigned long start = PFN_DOWN(bar->addr);
> +        unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
> +
> +        if ( !MAPPABLE_BAR(bar) ||
> +             (rom_only ? bar->type != VPCI_BAR_ROM
> +                       : (bar->type == VPCI_BAR_ROM && 
> !header->rom_enabled)) )
> +            continue;
> +
> +        rc = rangeset_add_range(mem, start, end);
> +        if ( rc )
> +        {
> +            printk(XENLOG_G_WARNING "Failed to add [%lx, %lx]: %d\n",
> +                   start, end, rc);
> +            rangeset_destroy(mem);
> +            return rc;
> +        }
> +    }
> +
> +    /*
> +     * Check for overlaps with other BARs. Note that only BARs that are
> +     * currently mapped (enabled) are checked for overlaps.
> +     */
> +    list_for_each_entry(tmp, &pdev->domain->arch.pdev_list, domain_list)
> +    {
> +        if ( tmp == pdev )
> +        {
> +            /*
> +             * Need to store the device so it's not constified and defer_map
> +             * can modify it in case of error.
> +             */
> +            dev = tmp;
> +            if ( !rom_only )
> +                /*
> +                 * If memory decoding is toggled avoid checking against the
> +                 * same device, or else all regions will be removed from the
> +                 * memory map in the unmap case.
> +                 */
> +                continue;
> +        }
> +
> +        for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
> +        {
> +            const struct vpci_bar *bar = &tmp->vpci->header.bars[i];
> +            unsigned long start = PFN_DOWN(bar->addr);
> +            unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
> +
> +            if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) 
> ||
> +                 /*
> +                  * If only the ROM enable bit is toggled check against other
> +                  * BARs in the same device for overlaps, but not against the
> +                  * same ROM BAR.
> +                  */
> +                 (rom_only && tmp == pdev && bar->type == VPCI_BAR_ROM) )
> +                continue;
> +
> +            rc = rangeset_remove_range(mem, start, end);
> +            if ( rc )
> +            {
> +                printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
> +                       start, end, rc);
> +                rangeset_destroy(mem);
> +                return rc;
> +            }
> +        }
> +    }
> +
> +    ASSERT(dev);
> +
> +    if ( system_state < SYS_STATE_active )
> +    {
> +        /*
> +         * Mappings might be created when building Dom0 if the memory
> decoding
> +         * bit of PCI devices is enabled. In that case it's not possible to
> +         * defer the operation, so call apply_map in order to create the
> +         * mappings right away. Note that at build time this function will 
> only
> +         * be called iff the memory decoding bit is enabled, thus the 
> operation
> +         * will always be to establish mappings and process all the BARs.
> +         */
> +        ASSERT(map && !rom_only);
> +        return apply_map(pdev->domain, pdev, mem);
> +    }
> +
> +    defer_map(dev->domain, dev, mem, map, rom_only);
> +
> +    return 0;
> +}
> +
> +static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
> +                      uint32_t cmd, void *data)
> +{
> +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +    uint16_t current_cmd = pci_conf_read16(pdev->seg, pdev->bus, slot,
> func,
> +                                           reg);
> +
> +    /*
> +     * Let Dom0 play with all the bits directly except for the memory
> +     * decoding one.
> +     */
> +    if ( (cmd ^ current_cmd) & PCI_COMMAND_MEMORY )
> +        /*
> +         * Ignore the error. No memory has been added or removed from the
> p2m
> +         * (because the actual p2m changes are deferred in defer_map) and the
> +         * memory decoding bit has not been changed, so leave everything as-
> is,
> +         * hoping the guest will realize and try again.
> +         */
> +        modify_bars(pdev, cmd & PCI_COMMAND_MEMORY, false);
> +    else
> +        pci_conf_write16(pdev->seg, pdev->bus, slot, func, reg, cmd);
> +}
> +
> +static void bar_write(const struct pci_dev *pdev, unsigned int reg,
> +                      uint32_t val, void *data)
> +{
> +    struct vpci_bar *bar = data;
> +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +    bool hi = false;
> +
> +    if ( pci_conf_read16(pdev->seg, pdev->bus, slot, func, PCI_COMMAND)
> &
> +         PCI_COMMAND_MEMORY )
> +    {
> +        gprintk(XENLOG_WARNING,
> +                "%04x:%02x:%02x.%u: ignored BAR %lu write with memory
> decoding enabled\n",
> +                pdev->seg, pdev->bus, slot, func,
> +                bar - pdev->vpci->header.bars);
> +        return;
> +    }
> +
> +    if ( bar->type == VPCI_BAR_MEM64_HI )
> +    {
> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
> +        bar--;
> +        hi = true;
> +    }
> +    else
> +        val &= PCI_BASE_ADDRESS_MEM_MASK;
> +
> +    /*
> +     * Update the cached address, so that when memory decoding is enabled
> +     * Xen can map the BAR into the guest p2m.
> +     */
> +    bar->addr &= ~(0xffffffffull << (hi ? 32 : 0));
> +    bar->addr |= (uint64_t)val << (hi ? 32 : 0);
> +
> +    /* Make sure Xen writes back the same value for the BAR RO bits. */
> +    if ( !hi )
> +    {
> +        val |= bar->type == VPCI_BAR_MEM32 ?
> PCI_BASE_ADDRESS_MEM_TYPE_32
> +                                           : PCI_BASE_ADDRESS_MEM_TYPE_64;
> +        val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
> +    }
> +
> +    pci_conf_write32(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                     PCI_FUNC(pdev->devfn), reg, val);
> +}
> +
> +static void rom_write(const struct pci_dev *pdev, unsigned int reg,
> +                      uint32_t val, void *data)
> +{
> +    struct vpci_header *header = &pdev->vpci->header;
> +    struct vpci_bar *rom = data;
> +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +    uint16_t cmd = pci_conf_read16(pdev->seg, pdev->bus, slot, func,
> +                                   PCI_COMMAND);
> +    bool new_enabled = val & PCI_ROM_ADDRESS_ENABLE;
> +
> +    if ( (cmd & PCI_COMMAND_MEMORY) && header->rom_enabled &&
> new_enabled )
> +    {
> +        gprintk(XENLOG_WARNING,
> +                "%04x:%02x:%02x.%u: ignored ROM BAR write with memory
> decoding enabled\n",
> +                pdev->seg, pdev->bus, slot, func);
> +        return;
> +    }
> +
> +    if ( !header->rom_enabled )
> +        /*
> +         * If the ROM BAR is not enabled update the address field so the
> +         * correct address is mapped into the p2m.
> +         */
> +        rom->addr = val & PCI_ROM_ADDRESS_MASK;
> +
> +    if ( !(cmd & PCI_COMMAND_MEMORY) || header->rom_enabled ==
> new_enabled )
> +    {
> +        /* Just update the ROM BAR field. */
> +        header->rom_enabled = new_enabled;
> +        pci_conf_write32(pdev->seg, pdev->bus, slot, func, reg, val);
> +    }
> +    else if ( modify_bars(pdev, new_enabled, true) )
> +        /*
> +         * No memory has been added or removed from the p2m (because the
> actual
> +         * p2m changes are deferred in defer_map) and the ROM enable bit has
> +         * not been changed, so leave everything as-is, hoping the guest will
> +         * realize and try again. It's important to not update rom->addr in 
> the
> +         * unmap case if modify_bars has failed, or future attempts would
> +         * attempt to unmap the wrong address.
> +         */
> +        return;
> +
> +    if ( !new_enabled )
> +        rom->addr = val & PCI_ROM_ADDRESS_MASK;
> +}
> +
> +static int init_bars(struct pci_dev *pdev)
> +{
> +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +    uint16_t cmd;
> +    uint64_t addr, size;
> +    unsigned int i, num_bars, rom_reg;
> +    struct vpci_header *header = &pdev->vpci->header;
> +    struct vpci_bar *bars = header->bars;
> +    pci_sbdf_t sbdf = {
> +        .seg = pdev->seg,
> +        .bus = pdev->bus,
> +        .dev = slot,
> +        .func = func,
> +    };
> +    int rc;
> +
> +    switch ( pci_conf_read8(pdev->seg, pdev->bus, slot, func,
> PCI_HEADER_TYPE)
> +             & 0x7f )
> +    {
> +    case PCI_HEADER_TYPE_NORMAL:
> +        num_bars = PCI_HEADER_NORMAL_NR_BARS;
> +        rom_reg = PCI_ROM_ADDRESS;
> +        break;
> +
> +    case PCI_HEADER_TYPE_BRIDGE:
> +        num_bars = PCI_HEADER_BRIDGE_NR_BARS;
> +        rom_reg = PCI_ROM_ADDRESS1;
> +        break;
> +
> +    default:
> +        return -EOPNOTSUPP;
> +    }
> +
> +    /* Setup a handler for the command register. */
> +    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write,
> PCI_COMMAND,
> +                           2, header);
> +    if ( rc )
> +        return rc;
> +
> +    /* Disable memory decoding before sizing. */
> +    cmd = pci_conf_read16(pdev->seg, pdev->bus, slot, func,
> PCI_COMMAND);
> +    if ( cmd & PCI_COMMAND_MEMORY )
> +        pci_conf_write16(pdev->seg, pdev->bus, slot, func, PCI_COMMAND,
> +                         cmd & ~PCI_COMMAND_MEMORY);
> +
> +    for ( i = 0; i < num_bars; i++ )
> +    {
> +        uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;
> +        uint32_t val;
> +
> +        if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
> +        {
> +            bars[i].type = VPCI_BAR_MEM64_HI;
> +            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, 
> reg,
> +                                   4, &bars[i]);
> +            if ( rc )
> +            {
> +                pci_conf_write16(pdev->seg, pdev->bus, slot, func,
> +                                 PCI_COMMAND, cmd);
> +                return rc;
> +            }
> +
> +            continue;
> +        }
> +
> +        val = pci_conf_read32(pdev->seg, pdev->bus, slot, func, reg);
> +        if ( (val & PCI_BASE_ADDRESS_SPACE) ==
> PCI_BASE_ADDRESS_SPACE_IO )
> +        {
> +            bars[i].type = VPCI_BAR_IO;
> +            continue;
> +        }
> +        if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
> +             PCI_BASE_ADDRESS_MEM_TYPE_64 )
> +            bars[i].type = VPCI_BAR_MEM64_LO;
> +        else
> +            bars[i].type = VPCI_BAR_MEM32;
> +
> +        rc = pci_size_mem_bar(sbdf, reg, &addr, &size,
> +                              (i == num_bars - 1) ? PCI_BAR_LAST : 0);
> +        if ( rc < 0 )
> +        {
> +            pci_conf_write16(pdev->seg, pdev->bus, slot, func, PCI_COMMAND,
> +                             cmd);
> +            return rc;
> +        }
> +
> +        if ( size == 0 )
> +        {
> +            bars[i].type = VPCI_BAR_EMPTY;
> +            continue;
> +        }
> +
> +        bars[i].addr = addr;
> +        bars[i].size = size;
> +        bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
> +
> +        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
> +                               &bars[i]);
> +        if ( rc )
> +        {
> +            pci_conf_write16(pdev->seg, pdev->bus, slot, func, PCI_COMMAND,
> +                             cmd);
> +            return rc;
> +        }
> +    }
> +
> +    /* Check expansion ROM. */
> +    rc = pci_size_mem_bar(sbdf, rom_reg, &addr, &size, PCI_BAR_ROM);
> +    if ( rc > 0 && size )
> +    {
> +        struct vpci_bar *rom = &header->bars[num_bars];
> +
> +        rom->type = VPCI_BAR_ROM;
> +        rom->size = size;
> +        rom->addr = addr;
> +        header->rom_enabled = pci_conf_read32(pdev->seg, pdev->bus, slot,
> func,
> +                                              rom_reg) & 
> PCI_ROM_ADDRESS_ENABLE;
> +
> +        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write,
> rom_reg,
> +                               4, rom);
> +        if ( rc )
> +            rom->type = VPCI_BAR_EMPTY;
> +    }
> +
> +    return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, true,
> false) : 0;
> +}
> +REGISTER_VPCI_INIT(init_bars);
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 4740d02edf..e5b49b9d82 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -34,6 +34,23 @@ struct vpci_register {
>      struct list_head node;
>  };
> 
> +void vpci_remove_device(struct pci_dev *pdev)
> +{
> +    spin_lock(&pdev->vpci->lock);
> +    while ( !list_empty(&pdev->vpci->handlers) )
> +    {
> +        struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
> +                                                   struct vpci_register,
> +                                                   node);
> +
> +        list_del(&r->node);
> +        xfree(r);
> +    }
> +    spin_unlock(&pdev->vpci->lock);
> +    xfree(pdev->vpci);
> +    pdev->vpci = NULL;
> +}
> +
>  int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>  {
>      unsigned int i;
> @@ -57,19 +74,7 @@ int __hwdom_init vpci_add_handlers(struct pci_dev
> *pdev)
>      }
> 
>      if ( rc )
> -    {
> -        while ( !list_empty(&pdev->vpci->handlers) )
> -        {
> -            struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
> -                                                       struct vpci_register,
> -                                                       node);
> -
> -            list_del(&r->node);
> -            xfree(r);
> -        }
> -        xfree(pdev->vpci);
> -        pdev->vpci = NULL;
> -    }
> +        vpci_remove_device(pdev);
> 
>      return rc;
>  }
> @@ -102,6 +107,20 @@ static void vpci_ignored_write(const struct pci_dev
> *pdev, unsigned int reg,
>  {
>  }
> 
> +uint32_t vpci_hw_read16(const struct pci_dev *pdev, unsigned int reg,
> +                        void *data)
> +{
> +    return pci_conf_read16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                           PCI_FUNC(pdev->devfn), reg);
> +}
> +
> +uint32_t vpci_hw_read32(const struct pci_dev *pdev, unsigned int reg,
> +                        void *data)
> +{
> +    return pci_conf_read32(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                           PCI_FUNC(pdev->devfn), reg);
> +}
> +
>  int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>                        vpci_write_t *write_handler, unsigned int offset,
>                        unsigned int size, void *data)
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index f89896e59b..57bb142c02 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -20,6 +20,7 @@
>  #include <xen/smp.h>
>  #include <xen/perfc.h>
>  #include <asm/atomic.h>
> +#include <xen/vpci.h>
>  #include <xen/wait.h>
>  #include <public/xen.h>
>  #include <public/domctl.h>
> @@ -264,6 +265,9 @@ struct vcpu
> 
>      struct evtchn_fifo_vcpu *evtchn_fifo;
> 
> +    /* vPCI per-vCPU area, used to store data for long running operations. */
> +    struct vpci_vcpu vpci;
> +
>      struct arch_vcpu arch;
>  };
> 
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index 9f2864fb0c..6bf8b22b4f 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -1,6 +1,8 @@
>  #ifndef _XEN_VPCI_H_
>  #define _XEN_VPCI_H_
> 
> +#ifdef CONFIG_HAS_VPCI
> +
>  #include <xen/pci.h>
>  #include <xen/types.h>
>  #include <xen/list.h>
> @@ -20,6 +22,9 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
>  /* Add vPCI handlers to device. */
>  int __must_check vpci_add_handlers(struct pci_dev *dev);
> 
> +/* Remove all handlers and free vpci related structures. */
> +void vpci_remove_device(struct pci_dev *pdev);
> +
>  /* Add/remove a register handler. */
>  int __must_check vpci_add_register(struct vpci *vpci,
>                                     vpci_read_t *read_handler,
> @@ -34,12 +39,68 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
> unsigned int size);
>  void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>                  uint32_t data);
> 
> +/* Passthrough handlers. */
> +uint32_t vpci_hw_read16(const struct pci_dev *pdev, unsigned int reg,
> +                        void *data);
> +uint32_t vpci_hw_read32(const struct pci_dev *pdev, unsigned int reg,
> +                        void *data);
> +
> +/*
> + * Check for pending vPCI operations on this vcpu. Returns true if the vcpu
> + * should not run.
> + */
> +bool __must_check vpci_process_pending(struct vcpu *v);
> +
>  struct vpci {
>      /* List of vPCI handlers for a device. */
>      struct list_head handlers;
>      spinlock_t lock;
> +
> +#ifdef __XEN__
> +    /* Hide the rest of the vpci struct from the user-space test harness. */
> +    struct vpci_header {
> +        /* Information about the PCI BARs of this device. */
> +        struct vpci_bar {
> +            uint64_t addr;
> +            uint64_t size;
> +            enum {
> +                VPCI_BAR_EMPTY,
> +                VPCI_BAR_IO,
> +                VPCI_BAR_MEM32,
> +                VPCI_BAR_MEM64_LO,
> +                VPCI_BAR_MEM64_HI,
> +                VPCI_BAR_ROM,
> +            } type;
> +            bool prefetchable : 1;
> +            /* Store whether the BAR is mapped into guest p2m. */
> +            bool enabled      : 1;
> +#define PCI_HEADER_NORMAL_NR_BARS        6
> +#define PCI_HEADER_BRIDGE_NR_BARS        2
> +        } bars[PCI_HEADER_NORMAL_NR_BARS + 1];
> +        /* At most 6 BARS + 1 expansion ROM BAR. */
> +
> +        /*
> +         * Store whether the ROM enable bit is set (doesn't imply ROM BAR
> +         * is mapped into guest p2m) if there's a ROM BAR on the device.
> +         */
> +        bool rom_enabled      : 1;
> +        /* FIXME: currently there's no support for SR-IOV. */
> +    } header;
> +#endif
> +};
> +
> +struct vpci_vcpu {
> +    /* Per-vcpu structure to store state while {un}mapping of PCI BARs. */
> +    struct rangeset *mem;
> +    struct pci_dev *pdev;
> +    bool map      : 1;
> +    bool rom_only : 1;
>  };
> 
> +#else /* !CONFIG_HAS_VPCI */
> +struct vpci_vcpu {};
> +#endif
> +
>  #endif
> 
>  /*
> --
> 2.16.2

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.