[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space
> -----Original Message----- > From: Roger Pau Monne [mailto:roger.pau@xxxxxxxxxx] > Sent: 30 June 2017 16:01 > To: xen-devel@xxxxxxxxxxxxxxxxxxxx > Cc: boris.ostrovsky@xxxxxxxxxx; julien.grall@xxxxxxx; > konrad.wilk@xxxxxxxxxx; Roger Pau Monne <roger.pau@xxxxxxxxxx>; Ian > Jackson <Ian.Jackson@xxxxxxxxxx>; Wei Liu <wei.liu2@xxxxxxxxxx>; Jan > Beulich <jbeulich@xxxxxxxx>; Andrew Cooper > <Andrew.Cooper3@xxxxxxxxxx>; Paul Durrant <Paul.Durrant@xxxxxxxxxx> > Subject: [PATCH v4 1/9] xen/vpci: introduce basic handlers to trap accesses > to the PCI config space > > This functionality is going to reside in vpci.c (and the corresponding > vpci.h header), and should be arch-agnostic. The handlers introduced > in this patch setup the basic functionality required in order to trap > accesses to the PCI config space, and allow decoding the address and > finding the corresponding handler that should handle the access > (although no handlers are implemented). > > Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are > setup inside of a x86 HVM file, since that's not shared with other > arches. > > A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen > whether a domain should use the newly introduced vPCI handlers, this > is only enabled for PVH Dom0 at the moment. > > A very simple user-space test is also provided, so that the basic > functionality of the vPCI traps can be asserted. This has been proven > quite helpful during development, since the logic to handle partial > accesses or accesses that expand across multiple registers is not > trivial. > > The handlers for the registers are added to a linked list that's keep > sorted at all times. Both the read and write handlers support accesses > that expand across multiple emulated registers and contain gaps not > emulated. > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > --- > Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> > Cc: Wei Liu <wei.liu2@xxxxxxxxxx> > Cc: Jan Beulich <jbeulich@xxxxxxxx> > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > Cc: Paul Durrant <paul.durrant@xxxxxxxxxx> > --- > Changes since v3: > * User-space test harness: > - Fix spaces in container_of macro. > - Implement a dummy locking functions. > - Remove 'current' macro make current a pointer to the statically > allocated vpcu. > - Remove unneeded parentheses in the pci_conf_readX macros. > - Fix the name of the write test macro. > - Remove the dummy EXPORT_SYMBOL macro (this was needed by the RB > code only). > - Import the max macro. > - Test all possible read/write size combinations with all possible > emulated register sizes. > - Introduce a test for register removal. > * Hypervisor code: > - Use a sorted list in order to store the config space handlers. > - Remove some unneeded 'else' branches. > - Make the IO port handlers always return X86EMUL_OKAY, and set the > data to all 1's in case of read failure (write are simply ignored). > - In hvm_select_ioreq_server reuse local variables when calling > XEN_DMOP_PCI_SBDF. > - Store the pointers to the initialization functions in the .rodata > section. > - Do not ignore the return value of xen_vpci_add_handlers in > setup_one_hwdom_device. > - Remove the vpci_init macro. > - Do not hide the pointers inside of the vpci_{read/write}_t > typedefs. > - Rename priv_data to private in vpci_register. > - Simplify checking for register overlap in vpci_register_cmp. > - Check that the offset and the length match before removing a > register in xen_vpci_remove_register. > - Make vpci_read_hw return a value rather than storing it in a > pointer passed by parameter. > - Handler dispatcher functions vpci_{read/write} no longer return an > error code, errors on reads/writes should be treated like hardware > (writes ignored, reads return all 1's or garbage). > - Make sure pcidevs is locked before calling pci_get_pdev_by_domain. > - Use a recursive spinlock for the vpci lock, so that spin_is_locked > checks that the current CPU is holding the lock. > - Make the code less error-chatty by removing some of the printk's. > - Pass the slot and the function as separate parameters to the > handler dispatchers (instead of passing devfn). > - Allow handlers to be registered with either a read or write > function only, the missing handler will be replaced by a dummy > handler (writes ignored, reads return 1's). > - Introduce PCI_CFG_SPACE_* defines from Linux. > - Simplify the handler dispatchers by removing the recursion, now the > dispatchers iterate over the list of sorted handlers and call them > in order. > - Remove the GENMASK_BYTES, SHIFT_RIGHT_BYTES and ADD_RESULT > macros, > and instead provide a merge_result function in order to merge a > register output into a partial result. > - Rename the fields of the vpci_val union to u8/u16/u32. > - Remove the return values from the read/write handlers, errors > should be handled internally and signaled as would be done on > native hardware. > - Remove the usage of the GENMASK macro. > > Changes since v2: > - Generalize the PCI address decoding and use it for IOREQ code also. > > Changes since v1: > - Allow access to cross a word-boundary. > - Add locking. > - Add cleanup to xen_vpci_add_handlers in case of failure. > --- [snip] > diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S > index 44bd3bf0ce..a24d8e913a 100644 > --- a/xen/arch/arm/xen.lds.S > +++ b/xen/arch/arm/xen.lds.S > @@ -41,6 +41,9 @@ SECTIONS > > . = ALIGN(PAGE_SIZE); > .rodata : { > + __start_vpci_array = .; > + *(.rodata.vpci) > + __end_vpci_array = .; > _srodata = .; /* Read-only data */ > /* Bug frames table */ > __start_bug_frames = .; > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c > index f7873da323..23e7df3838 100644 > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -376,11 +376,21 @@ static bool emulation_flags_ok(const struct domain > *d, uint32_t emflags) > if ( is_hvm_domain(d) ) > { > if ( is_hardware_domain(d) && > - emflags != (XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC) ) > - return false; > - if ( !is_hardware_domain(d) && emflags && > - emflags != XEN_X86_EMU_ALL && emflags != XEN_X86_EMU_LAPIC ) > + emflags != (XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC| > + XEN_X86_EMU_VPCI) ) > return false; > + if ( !is_hardware_domain(d) ) > + { > + switch ( emflags ) > + { > + case XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI: > + case XEN_X86_EMU_LAPIC: > + case 0: > + break; > + default: > + return false; > + } > + } Can't the if and the following switch be combined? > } > else if ( emflags != 0 && emflags != XEN_X86_EMU_PIT ) > { > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c > index 3ed6ec468d..c4176ee458 100644 > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -36,6 +36,7 @@ > #include <xen/rangeset.h> > #include <xen/monitor.h> > #include <xen/warning.h> > +#include <xen/vpci.h> > #include <asm/shadow.h> > #include <asm/hap.h> > #include <asm/current.h> > @@ -630,6 +631,7 @@ int hvm_domain_initialise(struct domain *d, unsigned > long domcr_flags, > d->arch.hvm_domain.io_bitmap = hvm_io_bitmap; > > register_g2m_portio_handler(d); > + register_vpci_portio_handler(d); > > hvm_ioreq_init(d); > > diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c > index 214ab307c4..4e91a485cd 100644 > --- a/xen/arch/x86/hvm/io.c > +++ b/xen/arch/x86/hvm/io.c > @@ -25,6 +25,7 @@ > #include <xen/trace.h> > #include <xen/event.h> > #include <xen/hypercall.h> > +#include <xen/vpci.h> > #include <asm/current.h> > #include <asm/cpufeature.h> > #include <asm/processor.h> > @@ -256,6 +257,147 @@ void register_g2m_portio_handler(struct domain > *d) > handler->ops = &g2m_portio_ops; > } > > +/* Do some sanity checks. */ > +static int vpci_access_check(unsigned int reg, unsigned int len) > +{ > + /* Check access size. */ > + if ( len != 1 && len != 2 && len != 4 ) > + return -EINVAL; > + > + /* Check if access crosses a double-word boundary. */ > + if ( (reg & 3) + len > 4 ) > + return -EINVAL; > + > + return 0; > +} > + > +/* Helper to decode a PCI address. */ > +void hvm_pci_decode_addr(unsigned int cf8, unsigned int addr, > + unsigned int *bus, unsigned int *slot, > + unsigned int *func, unsigned int *reg) > +{ > + unsigned long bdf; > + > + ASSERT(CF8_ENABLED(cf8)); > + > + bdf = CF8_BDF(cf8); > + *bus = PCI_BUS(bdf); > + *slot = PCI_SLOT(bdf); > + *func = PCI_FUNC(bdf); > + /* > + * NB: the lower 2 bits of the register address are fetched from the > + * offset into the 0xcfc register when reading/writing to it. > + */ > + *reg = CF8_ADDR_LO(cf8) | (addr & 3); > +} > + > +/* vPCI config space IO ports handlers (0xcf8/0xcfc). */ > +static bool vpci_portio_accept(const struct hvm_io_handler *handler, > + const ioreq_t *p) > +{ > + return (p->addr == 0xcf8 && p->size == 4) || (p->addr & 0xfffc) == 0xcfc; > +} > + > +static int vpci_portio_read(const struct hvm_io_handler *handler, > + uint64_t addr, uint32_t size, uint64_t *data) > +{ > + struct domain *d = current->domain; > + unsigned int bus, slot, func, reg; > + > + *data = ~(uint64_t)0; > + > + vpci_lock(d); > + if ( addr == 0xcf8 ) > + { > + ASSERT(size == 4); > + *data = d->arch.hvm_domain.pci_cf8; > + vpci_unlock(d); > + return X86EMUL_OKAY; > + } > + if ( !CF8_ENABLED(d->arch.hvm_domain.pci_cf8) ) > + { > + vpci_unlock(d); > + return X86EMUL_OKAY; > + } > + > + /* Decode the PCI address. */ > + hvm_pci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus, &slot, > &func, > + ®); > + > + if ( vpci_access_check(reg, size) ) > + { > + vpci_unlock(d); > + return X86EMUL_OKAY; > + } > + > + pcidevs_lock(); > + *data = vpci_read(0, bus, slot, func, reg, size); > + pcidevs_unlock(); > + vpci_unlock(d); > + > + return X86EMUL_OKAY; > +} > + > +static int vpci_portio_write(const struct hvm_io_handler *handler, > + uint64_t addr, uint32_t size, uint64_t data) > +{ > + struct domain *d = current->domain; > + unsigned int bus, slot, func, reg; > + > + vpci_lock(d); > + if ( addr == 0xcf8 ) > + { > + ASSERT(size == 4); > + d->arch.hvm_domain.pci_cf8 = data; > + vpci_unlock(d); > + return X86EMUL_OKAY; > + } > + if ( !CF8_ENABLED(d->arch.hvm_domain.pci_cf8) ) > + { > + vpci_unlock(d); > + return X86EMUL_OKAY; > + } > + > + /* Decode the PCI address. */ > + hvm_pci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus, &slot, > &func, > + ®); > + > + if ( vpci_access_check(reg, size) ) > + { > + vpci_unlock(d); > + return X86EMUL_OKAY; > + } > + > + pcidevs_lock(); > + vpci_write(0, bus, slot, func, reg, size, data); > + pcidevs_unlock(); > + vpci_unlock(d); > + > + return X86EMUL_OKAY; > +} > + > +static const struct hvm_io_ops vpci_portio_ops = { > + .accept = vpci_portio_accept, > + .read = vpci_portio_read, > + .write = vpci_portio_write, > +}; > + > +void register_vpci_portio_handler(struct domain *d) > +{ > + struct hvm_io_handler *handler; > + > + if ( !has_vpci(d) ) > + return; > + > + handler = hvm_next_io_handler(d); > + if ( !handler ) > + return; > + > + spin_lock_init(&d->arch.hvm_domain.vpci_lock); > + handler->type = IOREQ_TYPE_PIO; > + handler->ops = &vpci_portio_ops; > +} > + > /* > * Local variables: > * mode: C > diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c > index b2a8b0e986..726c5c0c36 100644 > --- a/xen/arch/x86/hvm/ioreq.c > +++ b/xen/arch/x86/hvm/ioreq.c > @@ -1178,18 +1178,16 @@ struct hvm_ioreq_server > *hvm_select_ioreq_server(struct domain *d, > CF8_ENABLED(cf8) ) > { > uint32_t sbdf, x86_fam; > + unsigned int bus, slot, func, reg; > + > + hvm_pci_decode_addr(cf8, p->addr, &bus, &slot, &func, ®); > > /* PCI config data cycle */ > > - sbdf = XEN_DMOP_PCI_SBDF(0, > - PCI_BUS(CF8_BDF(cf8)), > - PCI_SLOT(CF8_BDF(cf8)), > - PCI_FUNC(CF8_BDF(cf8))); > + sbdf = XEN_DMOP_PCI_SBDF(0, bus, slot, func); > > type = XEN_DMOP_IO_RANGE_PCI; > - addr = ((uint64_t)sbdf << 32) | > - CF8_ADDR_LO(cf8) | > - (p->addr & 3); > + addr = ((uint64_t)sbdf << 32) | reg; > /* AMD extended configuration space access? */ > if ( CF8_ADDR_HI(cf8) && > d->arch.cpuid->x86_vendor == X86_VENDOR_AMD && > diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c > index f7b927858c..4cf919f206 100644 > --- a/xen/arch/x86/setup.c > +++ b/xen/arch/x86/setup.c > @@ -1566,7 +1566,8 @@ void __init noreturn __start_xen(unsigned long > mbi_p) > domcr_flags |= DOMCRF_hvm | > ((hvm_funcs.hap_supported && !opt_dom0_shadow) ? > DOMCRF_hap : 0); > - config.emulation_flags = > XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC; > + config.emulation_flags = > XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC| > + XEN_X86_EMU_VPCI; > } > > /* Create initial domain 0. */ > diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S > index 8289a1bf09..451e7970da 100644 > --- a/xen/arch/x86/xen.lds.S > +++ b/xen/arch/x86/xen.lds.S > @@ -76,6 +76,9 @@ SECTIONS > > __2M_rodata_start = .; /* Start of 2M superpages, mapped RO. */ > .rodata : { > + __start_vpci_array = .; > + *(.rodata.vpci) > + __end_vpci_array = .; > _srodata = .; > /* Bug frames table */ > __start_bug_frames = .; > diff --git a/xen/drivers/Makefile b/xen/drivers/Makefile > index 19391802a8..d51c766453 100644 > --- a/xen/drivers/Makefile > +++ b/xen/drivers/Makefile > @@ -1,6 +1,6 @@ > subdir-y += char > subdir-$(CONFIG_HAS_CPUFREQ) += cpufreq > -subdir-$(CONFIG_HAS_PCI) += pci > +subdir-$(CONFIG_HAS_PCI) += pci vpci > subdir-$(CONFIG_HAS_PASSTHROUGH) += passthrough > subdir-$(CONFIG_ACPI) += acpi > subdir-$(CONFIG_VIDEO) += video > diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c > index 6e7126b2e8..3208cd5d71 100644 > --- a/xen/drivers/passthrough/pci.c > +++ b/xen/drivers/passthrough/pci.c > @@ -30,6 +30,7 @@ > #include <xen/radix-tree.h> > #include <xen/softirq.h> > #include <xen/tasklet.h> > +#include <xen/vpci.h> > #include <xsm/xsm.h> > #include <asm/msi.h> > #include "ats.h" > @@ -1026,9 +1027,10 @@ static void setup_one_hwdom_device(const > struct setup_hwdom *ctxt, > struct pci_dev *pdev) > { > u8 devfn = pdev->devfn; > + int err; > > do { > - int err = ctxt->handler(devfn, pdev); > + err = ctxt->handler(devfn, pdev); > > if ( err ) > { > @@ -1041,6 +1043,11 @@ static void setup_one_hwdom_device(const > struct setup_hwdom *ctxt, > devfn += pdev->phantom_stride; > } while ( devfn != pdev->devfn && > PCI_SLOT(devfn) == PCI_SLOT(pdev->devfn) ); > + > + err = vpci_add_handlers(pdev); > + if ( err ) > + printk(XENLOG_ERR "setup of vPCI for d%d failed: %d\n", > + ctxt->d->domain_id, err); > } > > static int __hwdom_init _setup_hwdom_pci_devices(struct pci_seg *pseg, > void *arg) > diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile > new file mode 100644 > index 0000000000..840a906470 > --- /dev/null > +++ b/xen/drivers/vpci/Makefile > @@ -0,0 +1 @@ > +obj-y += vpci.o > diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c > new file mode 100644 > index 0000000000..c54de83b82 > --- /dev/null > +++ b/xen/drivers/vpci/vpci.c > @@ -0,0 +1,405 @@ > +/* > + * Generic functionality for handling accesses to the PCI configuration space > + * from guests. > + * > + * Copyright (C) 2017 Citrix Systems R&D > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms and conditions of the GNU General Public > + * License, version 2, as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > GNU > + * General Public License for more details. > + * > + * You should have received a copy of the GNU General Public > + * License along with this program; If not, see > <http://www.gnu.org/licenses/>. > + */ > + > +#include <xen/sched.h> > +#include <xen/vpci.h> > + > +extern const vpci_register_init_t __start_vpci_array[], __end_vpci_array[]; > +#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array) > + > +/* Internal struct to store the emulated PCI registers. */ > +struct vpci_register { > + vpci_read_t *read; > + vpci_write_t *write; > + unsigned int size; > + unsigned int offset; > + void *private; > + struct list_head node; > +}; > + > +int __hwdom_init vpci_add_handlers(struct pci_dev *pdev) > +{ > + unsigned int i; > + int rc = 0; > + > + if ( !has_vpci(pdev->domain) ) > + return 0; > + > + pdev->vpci = xzalloc(struct vpci); > + if ( !pdev->vpci ) > + return -ENOMEM; > + > + INIT_LIST_HEAD(&pdev->vpci->handlers); > + > + for ( i = 0; i < NUM_VPCI_INIT; i++ ) > + { > + rc = __start_vpci_array[i](pdev); > + if ( rc ) > + break; > + } > + > + if ( rc ) > + { > + while ( !list_empty(&pdev->vpci->handlers) ) > + { > + struct vpci_register *r = list_first_entry(&pdev->vpci->handlers, > + struct vpci_register, > + node); > + > + list_del(&r->node); > + xfree(r); > + } > + xfree(pdev->vpci); > + } > + > + return rc; > +} > + > +static int vpci_register_cmp(const struct vpci_register *r1, > + const struct vpci_register *r2) > +{ > + /* Return 0 if registers overlap. */ > + if ( r1->offset < r2->offset + r2->size && > + r2->offset < r1->offset + r1->size ) > + return 0; > + if ( r1->offset < r2->offset ) > + return -1; > + if ( r1->offset > r2->offset ) > + return 1; > + > + ASSERT_UNREACHABLE(); > + return 0; > +} > + > +/* Dummy hooks, writes are ignored, reads return 1's */ > +static void vpci_ignored_read(struct pci_dev *pdev, unsigned int reg, > + union vpci_val *val, void *data) > +{ > + val->u32 = ~(uint32_t)0; > +} > + > +static void vpci_ignored_write(struct pci_dev *pdev, unsigned int reg, > + union vpci_val val, void *data) > +{ > +} > + > +int vpci_add_register(const struct pci_dev *pdev, vpci_read_t > read_handler, > + vpci_write_t write_handler, unsigned int offset, > + unsigned int size, void *data) > +{ > + struct list_head *head; > + struct vpci_register *r; > + > + /* Some sanity checks. */ > + if ( (size != 1 && size != 2 && size != 4) || > + offset >= PCI_CFG_SPACE_EXP_SIZE || offset & (size - 1) || > + (read_handler == NULL && write_handler == NULL) ) > + return -EINVAL; > + > + r = xmalloc(struct vpci_register); > + if ( !r ) > + return -ENOMEM; > + > + r->read = read_handler ?: vpci_ignored_read; > + r->write = write_handler ?: vpci_ignored_write; > + r->size = size; > + r->offset = offset; > + r->private = data; > + > + vpci_lock(pdev->domain); > + > + /* The list of handlers must be keep sorted at all times. */ > + list_for_each ( head, &pdev->vpci->handlers ) > + { > + const struct vpci_register *this = > + list_entry(head, const struct vpci_register, node); > + int cmp = vpci_register_cmp(r, this); > + > + if ( cmp < 0 ) > + break; > + if ( cmp == 0 ) > + { > + vpci_unlock(pdev->domain); > + xfree(r); > + return -EEXIST; > + } > + } > + > + list_add_tail(&r->node, head); > + vpci_unlock(pdev->domain); > + > + return 0; > +} > + > +int vpci_remove_register(const struct pci_dev *pdev, unsigned int offset, > + unsigned int size) > +{ > + const struct vpci_register r = { .offset = offset, .size = size }; > + struct vpci_register *rm = NULL; > + > + vpci_lock(pdev->domain); > + > + list_for_each_entry ( rm, &pdev->vpci->handlers, node ) > + if ( vpci_register_cmp(&r, rm) <= 0 ) > + break; > + > + if ( !rm || rm->offset != offset || rm->size != size ) > + { > + vpci_unlock(pdev->domain); > + return -ENOENT; > + } > + > + list_del(&rm->node); > + vpci_unlock(pdev->domain); > + xfree(rm); > + > + return 0; > +} > + > +/* Wrappers for performing reads/writes to the underlying hardware. */ > +static uint32_t vpci_read_hw(unsigned int seg, unsigned int bus, > + unsigned int slot, unsigned int func, > + unsigned int reg, uint32_t size) > +{ > + uint32_t data; > + > + switch ( size ) > + { > + case 4: > + data = pci_conf_read32(seg, bus, slot, func, reg); > + break; > + case 2: > + data = pci_conf_read16(seg, bus, slot, func, reg); > + break; > + case 1: > + data = pci_conf_read8(seg, bus, slot, func, reg); > + break; > + default: > + BUG(); > + } > + > + return data; > +} > + > +static void vpci_write_hw(unsigned int seg, unsigned int bus, > + unsigned int slot, unsigned int func, > + unsigned int reg, uint32_t size, uint32_t data) > +{ > + switch ( size ) > + { > + case 4: > + pci_conf_write32(seg, bus, slot, func, reg, data); > + break; > + case 3: > + /* > + * This is possible because a 4byte write can have 1byte trapped and > + * the rest passed-through. > + */ > + if ( reg & 1 ) > + { > + pci_conf_write8(seg, bus, slot, func, reg, data); > + pci_conf_write16(seg, bus, slot, func, reg + 1, data >> 8); > + } > + else > + { > + pci_conf_write16(seg, bus, slot, func, reg, data); > + pci_conf_write8(seg, bus, slot, func, reg + 2, data >> 16); > + } > + break; > + case 2: > + pci_conf_write16(seg, bus, slot, func, reg, data); > + break; > + case 1: > + pci_conf_write8(seg, bus, slot, func, reg, data); > + break; > + default: > + BUG(); > + } > +} > + > +/* > + * Merge new data into a partial result. > + * > + * Zero the bytes of 'data' from [offset, offset + size), and > + * merge the value found in 'new' from [0, offset) left shifted > + * by 'offset'. > + */ > +uint32_t merge_result(uint32_t data, uint32_t new, unsigned int size, > + unsigned int offset) > +{ > + uint32_t mask = ((uint64_t)1 << (size * 8)) - 1; > + > + return (data & ~(mask << (offset * 8))) | ((new & mask) << (offset * 8)); > +} > + > +uint32_t vpci_read(unsigned int seg, unsigned int bus, unsigned int slot, > + unsigned int func, unsigned int reg, uint32_t size) > +{ > + struct domain *d = current->domain; > + struct pci_dev *pdev; > + const struct vpci_register *r; > + unsigned int data_offset = 0; > + uint32_t data; > + > + ASSERT(pcidevs_locked()); > + ASSERT(vpci_locked(d)); > + > + /* > + * Read the hardware value. > + * NB: at the moment vPCI passthroughs everything (ie: permissive). > + */ > + data = vpci_read_hw(seg, bus, slot, func, reg, size); > + > + /* Find the PCI dev matching the address. */ > + pdev = pci_get_pdev_by_domain(d, seg, bus, PCI_DEVFN(slot, func)); > + if ( !pdev ) > + return data; > + > + /* Replace any values reported by the emulated registers. */ > + list_for_each_entry ( r, &pdev->vpci->handlers, node ) > + { > + const struct vpci_register emu = { > + .offset = reg + data_offset, > + .size = size - data_offset > + }; > + int cmp = vpci_register_cmp(&emu, r); > + union vpci_val val = { .u32 = ~0 }; > + unsigned int merge_size; > + > + if ( cmp < 0 ) > + break; > + if ( cmp > 0 ) > + continue; > + > + r->read(pdev, r->offset, &val, r->private); > + > + /* Check if the read is in the middle of a register. */ > + if ( r->offset < emu.offset ) > + val.u32 >>= (emu.offset - r->offset) * 8; > + > + data_offset = max(emu.offset, r->offset) - reg; > + /* Find the intersection size between the two sets. */ > + merge_size = min(emu.offset + emu.size, r->offset + r->size) - > + max(emu.offset, r->offset); > + /* Merge the emulated data into the native read value. */ > + data = merge_result(data, val.u32, merge_size, data_offset); > + data_offset += merge_size; > + if ( data_offset == size ) > + break; > + } > + > + return data; > +} > + > +/* > + * Perform a maybe partial write to a register. > + * > + * Note that this will only work for simple registers, if Xen needs to > + * trap accesses to rw1c registers (like the status PCI header register) > + * the logic in vpci_write will have to be expanded in order to correctly > + * deal with them. > + */ > +static void vpci_write_helper(struct pci_dev *pdev, > + const struct vpci_register *r, unsigned int > size, > + unsigned int offset, uint32_t data) > +{ > + union vpci_val val = { .u32 = data }; > + > + ASSERT(size <= r->size); > + if ( size != r->size ) > + { > + r->read(pdev, r->offset, &val, r->private); > + val.u32 = merge_result(val.u32, data, size, offset); > + } > + > + r->write(pdev, r->offset, val, r->private); > +} > + > +void vpci_write(unsigned int seg, unsigned int bus, unsigned int slot, > + unsigned int func, unsigned int reg, uint32_t size, > + uint32_t data) > +{ > + struct domain *d = current->domain; > + struct pci_dev *pdev; > + const struct vpci_register *r; > + unsigned int data_offset = 0; > + > + ASSERT(pcidevs_locked()); > + ASSERT(vpci_locked(d)); > + > + /* > + * Find the PCI dev matching the address. > + * Passthrough everything that's not trapped. > + * */ > + pdev = pci_get_pdev_by_domain(d, seg, bus, PCI_DEVFN(slot, func)); > + if ( !pdev ) > + { > + vpci_write_hw(seg, bus, slot, func, reg, size, data); > + return; > + } > + > + /* Write the value to the hardware or emulated registers. */ > + list_for_each_entry ( r, &pdev->vpci->handlers, node ) > + { > + const struct vpci_register emu = { > + .offset = reg + data_offset, > + .size = size - data_offset > + }; > + int cmp = vpci_register_cmp(&emu, r); > + unsigned int write_size; > + > + if ( cmp < 0 ) > + break; > + if ( cmp > 0 ) > + continue; > + > + if ( emu.offset < r->offset ) > + { > + /* Heading gap, write partial content to hardware. */ > + vpci_write_hw(seg, bus, slot, func, emu.offset, > + r->offset - emu.offset, data >> (data_offset * 8)); > + data_offset += r->offset - emu.offset; > + } > + > + /* Find the intersection size between the two sets. */ > + write_size = min(emu.offset + emu.size, r->offset + r->size) - > + max(emu.offset, r->offset); > + vpci_write_helper(pdev, r, write_size, reg + data_offset - r->offset, > + data >> (data_offset * 8)); > + data_offset += write_size; > + if ( data_offset == size ) > + break; > + } > + > + if ( data_offset < size ) > + /* Tailing gap, write the remaining. */ > + vpci_write_hw(seg, bus, slot, func, reg + data_offset, > + size - data_offset, data >> (data_offset * 8)); > +} > + > +/* > + * Local variables: > + * mode: C > + * c-file-style: "BSD" > + * c-basic-offset: 4 > + * tab-width: 4 > + * indent-tabs-mode: nil > + * End: > + */ > + > diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h > index 27d80eeff4..9be09df85d 100644 > --- a/xen/include/asm-x86/domain.h > +++ b/xen/include/asm-x86/domain.h > @@ -427,6 +427,7 @@ struct arch_domain > #define has_vpit(d) (!!((d)->arch.emulation_flags & > XEN_X86_EMU_PIT)) > #define has_pirq(d) (!!((d)->arch.emulation_flags & \ > XEN_X86_EMU_USE_PIRQ)) > +#define has_vpci(d) (!!((d)->arch.emulation_flags & > XEN_X86_EMU_VPCI)) > > #define has_arch_pdevs(d) (!list_empty(&(d)->arch.pdev_list)) > > diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm- > x86/hvm/domain.h > index d2899c9bb2..cbf4170789 100644 > --- a/xen/include/asm-x86/hvm/domain.h > +++ b/xen/include/asm-x86/hvm/domain.h > @@ -184,6 +184,9 @@ struct hvm_domain { > /* List of guest to machine IO ports mapping. */ > struct list_head g2m_ioport_list; > > + /* Lock for the PCI emulation layer (vPCI). */ > + spinlock_t vpci_lock; > + > /* List of permanently write-mapped pages. */ > struct { > spinlock_t lock; > diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h > index 2484eb1c75..0af1ed14dc 100644 > --- a/xen/include/asm-x86/hvm/io.h > +++ b/xen/include/asm-x86/hvm/io.h > @@ -149,12 +149,20 @@ void stdvga_deinit(struct domain *d); > > extern void hvm_dpci_msi_eoi(struct domain *d, int vector); > > +/* Decode a PCI port IO access into a bus/devfn/reg. */ > +void hvm_pci_decode_addr(unsigned int cf8, unsigned int addr, > + unsigned int *bus, unsigned int *slot, > + unsigned int *func, unsigned int *reg); > + > /* > * HVM port IO handler that performs forwarding of guest IO ports into > machine > * IO ports. > */ > void register_g2m_portio_handler(struct domain *d); > > +/* HVM port IO handler for PCI accesses. */ > +void register_vpci_portio_handler(struct domain *d); > + > #endif /* __ASM_X86_HVM_IO_H__ */ > > > diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch- > x86/xen.h > index f21332e897..86a1a09a8d 100644 > --- a/xen/include/public/arch-x86/xen.h > +++ b/xen/include/public/arch-x86/xen.h > @@ -295,12 +295,15 @@ struct xen_arch_domainconfig { > #define XEN_X86_EMU_PIT (1U<<_XEN_X86_EMU_PIT) > #define _XEN_X86_EMU_USE_PIRQ 9 > #define XEN_X86_EMU_USE_PIRQ (1U<<_XEN_X86_EMU_USE_PIRQ) > +#define _XEN_X86_EMU_VPCI 10 > +#define XEN_X86_EMU_VPCI (1U<<_XEN_X86_EMU_VPCI) > > #define XEN_X86_EMU_ALL (XEN_X86_EMU_LAPIC | > XEN_X86_EMU_HPET | \ > XEN_X86_EMU_PM | XEN_X86_EMU_RTC | > \ > XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC | > \ > XEN_X86_EMU_VGA | XEN_X86_EMU_IOMMU | > \ > - XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ) > + XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ > |\ > + XEN_X86_EMU_VPCI) > uint32_t emulation_flags; > }; > > diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h > index 59b6e8a81c..a9b80e330b 100644 > --- a/xen/include/xen/pci.h > +++ b/xen/include/xen/pci.h > @@ -88,6 +88,9 @@ struct pci_dev { > #define PT_FAULT_THRESHOLD 10 > } fault; > u64 vf_rlen[6]; > + > + /* Data for vPCI. */ > + struct vpci *vpci; > }; > > #define for_each_pdev(domain, pdev) \ > diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h > index ecd6124d91..cc4ee3b83e 100644 > --- a/xen/include/xen/pci_regs.h > +++ b/xen/include/xen/pci_regs.h > @@ -23,6 +23,14 @@ > #define LINUX_PCI_REGS_H > > /* > + * Conventional PCI and PCI-X Mode 1 devices have 256 bytes of > + * configuration space. PCI-X Mode 2 and PCIe devices have 4096 bytes of > + * configuration space. > + */ > +#define PCI_CFG_SPACE_SIZE 256 > +#define PCI_CFG_SPACE_EXP_SIZE 4096 > + > +/* > * Under PCI, each device has 256 bytes of configuration address space, > * of which the first 64 bytes are standardized as follows: > */ > diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h > new file mode 100644 > index 0000000000..5e1b0bb3da > --- /dev/null > +++ b/xen/include/xen/vpci.h > @@ -0,0 +1,79 @@ > +#ifndef _VPCI_ > +#define _VPCI_ > + > +#include <xen/pci.h> > +#include <xen/types.h> > +#include <xen/list.h> > + > +/* > + * Helpers for locking/unlocking. > + * > + * NB: the recursive variants are used so that spin_is_locked > + * returns whether the lock is hold by the current CPU (instead > + * of just returning whether the lock is hold by any CPU). > + */ > +#define vpci_lock(d) spin_lock_recursive(&(d)- > >arch.hvm_domain.vpci_lock) > +#define vpci_unlock(d) spin_unlock_recursive(&(d)- > >arch.hvm_domain.vpci_lock) > +#define vpci_locked(d) spin_is_locked(&(d)->arch.hvm_domain.vpci_lock) > + > +/* Value read or written by the handlers. */ > +union vpci_val { > + uint8_t u8; > + uint16_t u16; > + uint32_t u32; > +}; > + > +/* > + * The vPCI handlers will never be called concurrently for the same domain, > ii > + * is guaranteed that the vpci domain lock will always be locked when calling > + * any handler. > + */ > +typedef void (vpci_read_t)(struct pci_dev *pdev, unsigned int reg, > + union vpci_val *val, void *data); > + > +typedef void (vpci_write_t)(struct pci_dev *pdev, unsigned int reg, > + union vpci_val val, void *data); > + > +typedef int (*vpci_register_init_t)(struct pci_dev *dev); > + > +#define REGISTER_VPCI_INIT(x) \ > + static const vpci_register_init_t x##_entry \ > + __used_section(".rodata.vpci") = x > + > +/* Add vPCI handlers to device. */ > +int __must_check vpci_add_handlers(struct pci_dev *dev); > + > +/* Add/remove a register handler. */ > +int __must_check vpci_add_register(const struct pci_dev *pdev, > + vpci_read_t read_handler, > + vpci_write_t write_handler, > + unsigned int offset, > + unsigned int size, void *data); > +int __must_check vpci_remove_register(const struct pci_dev *pdev, > + unsigned int offset, > + unsigned int size); > + > +/* Generic read/write handlers for the PCI config space. */ > +uint32_t vpci_read(unsigned int seg, unsigned int bus, unsigned int slot, > + unsigned int func, unsigned int reg, uint32_t size); > +void vpci_write(unsigned int seg, unsigned int bus, unsigned int slot, > + unsigned int func, unsigned int reg, uint32_t size, > + uint32_t data); > + > +struct vpci { > + /* Root pointer for the tree of vPCI handlers. */ > + struct list_head handlers; > +}; > + > +#endif > + > +/* > + * Local variables: > + * mode: C > + * c-file-style: "BSD" > + * c-basic-offset: 4 > + * tab-width: 4 > + * indent-tabs-mode: nil > + * End: > + */ > + All the rest LGTM. Paul > -- > 2.11.0 (Apple Git-81) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |