[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 06/11] vpci/header: Handle p2m range sets per BAR
Hi, Roger! On 26.10.21 12:08, Roger Pau Monné wrote: > On Thu, Sep 30, 2021 at 10:52:18AM +0300, Oleksandr Andrushchenko wrote: >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >> >> Instead of handling a single range set, that contains all the memory >> regions of all the BARs and ROM, have them per BAR. >> >> This is in preparation of making non-identity mappings in p2m for the >> MMIOs/ROM. >> >> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >> --- >> xen/drivers/vpci/header.c | 172 ++++++++++++++++++++++++++------------ >> xen/include/xen/vpci.h | 3 +- >> 2 files changed, 122 insertions(+), 53 deletions(-) >> >> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c >> index ec4d215f36ff..9c603d26d302 100644 >> --- a/xen/drivers/vpci/header.c >> +++ b/xen/drivers/vpci/header.c >> @@ -131,49 +131,75 @@ static void modify_decoding(const struct pci_dev >> *pdev, uint16_t cmd, >> >> bool vpci_process_pending(struct vcpu *v) >> { >> - if ( v->vpci.mem ) >> + if ( v->vpci.num_mem_ranges ) >> { >> struct map_data data = { >> .d = v->domain, >> .map = v->vpci.cmd & PCI_COMMAND_MEMORY, >> }; >> - int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data); >> + struct pci_dev *pdev = v->vpci.pdev; >> + struct vpci_header *header = &pdev->vpci->header; >> + unsigned int i; >> >> - if ( rc == -ERESTART ) >> - return true; >> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >> + { >> + struct vpci_bar *bar = &header->bars[i]; >> + int rc; >> >> - spin_lock(&v->vpci.pdev->vpci->lock); >> - /* Disable memory decoding unconditionally on failure. */ >> - modify_decoding(v->vpci.pdev, >> - rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : >> v->vpci.cmd, >> - !rc && v->vpci.rom_only); >> - spin_unlock(&v->vpci.pdev->vpci->lock); >> + if ( !bar->mem ) >> + continue; >> >> - rangeset_destroy(v->vpci.mem); >> - v->vpci.mem = NULL; >> - if ( rc ) >> - /* >> - * FIXME: in case of failure remove the device from the domain. >> - * Note that there might still be leftover mappings. While this >> is >> - * safe for Dom0, for DomUs the domain will likely need to be >> - * killed in order to avoid leaking stale p2m mappings on >> - * failure. >> - */ >> - vpci_remove_device(v->vpci.pdev); >> + rc = rangeset_consume_ranges(bar->mem, map_range, &data); >> + >> + if ( rc == -ERESTART ) >> + return true; >> + >> + spin_lock(&pdev->vpci->lock); >> + /* Disable memory decoding unconditionally on failure. */ >> + modify_decoding(pdev, >> + rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : >> v->vpci.cmd, >> + !rc && v->vpci.rom_only); >> + spin_unlock(&pdev->vpci->lock); >> + >> + rangeset_destroy(bar->mem); > Now that the rangesets are per-BAR we might have to consider > allocating them at initialization time and not destroying them when > empty. We could replace the NULL checks with rangeset_is_empty > instead. Not that you have to do this on this patch, but I think it's > worth mentioning. Yes, this is a good idea. I will re-work the patch to create/destroy the rangesets once in add/remove > >> + bar->mem = NULL; >> + v->vpci.num_mem_ranges--; >> + if ( rc ) >> + /* >> + * FIXME: in case of failure remove the device from the >> domain. >> + * Note that there might still be leftover mappings. While >> this is >> + * safe for Dom0, for DomUs the domain will likely need to >> be >> + * killed in order to avoid leaking stale p2m mappings on >> + * failure. >> + */ >> + vpci_remove_device(pdev); >> + } >> } >> >> return false; >> } >> >> static int __init apply_map(struct domain *d, const struct pci_dev *pdev, >> - struct rangeset *mem, uint16_t cmd) >> + uint16_t cmd) >> { >> struct map_data data = { .d = d, .map = true }; >> - int rc; >> + struct vpci_header *header = &pdev->vpci->header; >> + int rc = 0; >> + unsigned int i; >> + >> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >> + { >> + struct vpci_bar *bar = &header->bars[i]; >> >> - while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == >> -ERESTART ) >> - process_pending_softirqs(); >> - rangeset_destroy(mem); >> + if ( !bar->mem ) >> + continue; >> + >> + while ( (rc = rangeset_consume_ranges(bar->mem, map_range, >> + &data)) == -ERESTART ) >> + process_pending_softirqs(); >> + rangeset_destroy(bar->mem); >> + bar->mem = NULL; >> + } >> if ( !rc ) >> modify_decoding(pdev, cmd, false); >> >> @@ -181,7 +207,7 @@ static int __init apply_map(struct domain *d, const >> struct pci_dev *pdev, >> } >> >> static void defer_map(struct domain *d, struct pci_dev *pdev, >> - struct rangeset *mem, uint16_t cmd, bool rom_only) >> + uint16_t cmd, bool rom_only, uint8_t num_mem_ranges) > Like mentioned below, I don't think you need to pass the number of > BARs that need mapping changes. Iff that's strictly needed, it should > be an unsigned int. bool map_pending :1 works great > >> { >> struct vcpu *curr = current; >> >> @@ -192,9 +218,9 @@ static void defer_map(struct domain *d, struct pci_dev >> *pdev, >> * started for the same device if the domain is not well-behaved. >> */ >> curr->vpci.pdev = pdev; >> - curr->vpci.mem = mem; >> curr->vpci.cmd = cmd; >> curr->vpci.rom_only = rom_only; >> + curr->vpci.num_mem_ranges = num_mem_ranges; >> /* >> * Raise a scheduler softirq in order to prevent the guest from >> resuming >> * execution with pending mapping operations, to trigger the invocation >> @@ -206,42 +232,47 @@ static void defer_map(struct domain *d, struct pci_dev >> *pdev, >> static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool >> rom_only) >> { >> struct vpci_header *header = &pdev->vpci->header; >> - struct rangeset *mem = rangeset_new(NULL, NULL, 0); >> struct pci_dev *tmp, *dev = NULL; >> const struct vpci_msix *msix = pdev->vpci->msix; >> - unsigned int i; >> + unsigned int i, j; >> int rc; >> - >> - if ( !mem ) >> - return -ENOMEM; >> + uint8_t num_mem_ranges; >> >> /* >> - * Create a rangeset that represents the current device BARs memory >> region >> + * Create a rangeset per BAR that represents the current device memory >> region >> * and compare it against all the currently active BAR memory regions. >> If >> * an overlap is found, subtract it from the region to be >> mapped/unmapped. >> * >> - * First fill the rangeset with all the BARs of this device or with the >> ROM >> + * First fill the rangesets with all the BARs of this device or with >> the ROM >> * BAR only, depending on whether the guest is toggling the memory >> decode >> * bit of the command register, or the enable bit of the ROM BAR >> register. >> */ >> for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >> { >> - const struct vpci_bar *bar = &header->bars[i]; >> + struct vpci_bar *bar = &header->bars[i]; >> unsigned long start = PFN_DOWN(bar->addr); >> unsigned long end = PFN_DOWN(bar->addr + bar->size - 1); >> >> + bar->mem = NULL; > Why do you need to set mem to NULL here? I think we should instead > assert that bar->mem == NULL here. I will put an ASSERT here > >> + >> if ( !MAPPABLE_BAR(bar) || >> (rom_only ? bar->type != VPCI_BAR_ROM >> : (bar->type == VPCI_BAR_ROM && >> !header->rom_enabled)) ) >> continue; >> >> - rc = rangeset_add_range(mem, start, end); >> + bar->mem = rangeset_new(NULL, NULL, 0); >> + if ( !bar->mem ) >> + { >> + rc = -ENOMEM; >> + goto fail; >> + } >> + >> + rc = rangeset_add_range(bar->mem, start, end); >> if ( rc ) >> { >> printk(XENLOG_G_WARNING "Failed to add [%lx, %lx]: %d\n", >> start, end, rc); >> - rangeset_destroy(mem); >> - return rc; >> + goto fail; >> } >> } >> >> @@ -252,14 +283,21 @@ static int modify_bars(const struct pci_dev *pdev, >> uint16_t cmd, bool rom_only) >> unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) + >> vmsix_table_size(pdev->vpci, i) - 1); >> >> - rc = rangeset_remove_range(mem, start, end); >> - if ( rc ) >> + for ( j = 0; j < ARRAY_SIZE(header->bars); j++ ) >> { >> - printk(XENLOG_G_WARNING >> - "Failed to remove MSIX table [%lx, %lx]: %d\n", >> - start, end, rc); >> - rangeset_destroy(mem); >> - return rc; >> + const struct vpci_bar *bar = &header->bars[j]; >> + >> + if ( !bar->mem ) >> + continue; >> + >> + rc = rangeset_remove_range(bar->mem, start, end); >> + if ( rc ) >> + { >> + printk(XENLOG_G_WARNING >> + "Failed to remove MSIX table [%lx, %lx]: %d\n", >> + start, end, rc); >> + goto fail; >> + } >> } >> } >> >> @@ -291,7 +329,8 @@ static int modify_bars(const struct pci_dev *pdev, >> uint16_t cmd, bool rom_only) >> unsigned long start = PFN_DOWN(bar->addr); >> unsigned long end = PFN_DOWN(bar->addr + bar->size - 1); >> >> - if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) >> || >> + if ( !bar->enabled || >> + !rangeset_overlaps_range(bar->mem, start, end) || >> /* >> * If only the ROM enable bit is toggled check against >> other >> * BARs in the same device for overlaps, but not against >> the >> @@ -300,13 +339,12 @@ static int modify_bars(const struct pci_dev *pdev, >> uint16_t cmd, bool rom_only) >> (rom_only && tmp == pdev && bar->type == VPCI_BAR_ROM) ) >> continue; >> >> - rc = rangeset_remove_range(mem, start, end); >> + rc = rangeset_remove_range(bar->mem, start, end); >> if ( rc ) >> { >> printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: >> %d\n", >> start, end, rc); >> - rangeset_destroy(mem); >> - return rc; >> + goto fail; >> } >> } >> } >> @@ -324,12 +362,42 @@ static int modify_bars(const struct pci_dev *pdev, >> uint16_t cmd, bool rom_only) >> * will always be to establish mappings and process all the BARs. >> */ >> ASSERT((cmd & PCI_COMMAND_MEMORY) && !rom_only); >> - return apply_map(pdev->domain, pdev, mem, cmd); >> + return apply_map(pdev->domain, pdev, cmd); >> } >> >> - defer_map(dev->domain, dev, mem, cmd, rom_only); >> + /* Find out how many memory ranges has left after MSI and overlaps. */ >> + num_mem_ranges = 0; >> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >> + { >> + struct vpci_bar *bar = &header->bars[i]; > There's no need to declare this local variable AFAICT, just use > header->bars[i].mem. Ok > In any case this is likely to go away if you > follow my recommendation below to just call defer_map unconditionally > like it's currently done. Please see below >> + >> + if ( !rangeset_is_empty(bar->mem) ) >> + num_mem_ranges++; >> + } >> + >> + /* >> + * There are cases when PCI device, root port for example, has neither >> + * memory space nor IO. In this case PCI command register write is >> + * missed resulting in the underlying PCI device not functional, so: >> + * - if there are no regions write the command register now >> + * - if there are regions then defer work and write later on >> + */ >> + if ( !num_mem_ranges ) >> + pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd); > I think this is wrong, as not calling defer_map will prevent the > rangesets (bar[i]->mem) from being destroyed, so we are effectively > leaking memory. Not really. As in case of num_mem_ranges == 0 there are no rangesets to free as none was allocated > > You need to take a path similar to the failure one in case there are > no mappings pending, or even better just call defer_map anyway and let > it do it's thing, it should be capable of handling empty rangesets > just fine. That's how it's currently done. So, I think this is still valid to break early and do not go with defer_map > >> + else >> + defer_map(dev->domain, dev, cmd, rom_only, num_mem_ranges); >> >> return 0; >> + >> +fail: > We usually ask labels to be indented with one space. Sure. I am confused a bit: there is no word for that in the coding style and the sources use labels with and without the space. > >> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >> + { >> + struct vpci_bar *bar = &header->bars[i]; >> + >> + rangeset_destroy(bar->mem); >> + bar->mem = NULL; >> + } >> + return rc; >> } >> >> static void cmd_write(const struct pci_dev *pdev, unsigned int reg, >> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h >> index a0320b22cb36..352e02d0106d 100644 >> --- a/xen/include/xen/vpci.h >> +++ b/xen/include/xen/vpci.h >> @@ -80,6 +80,7 @@ struct vpci { >> /* Guest view of the BAR. */ >> uint64_t guest_addr; >> uint64_t size; >> + struct rangeset *mem; >> enum { >> VPCI_BAR_EMPTY, >> VPCI_BAR_IO, >> @@ -154,9 +155,9 @@ struct vpci { >> >> struct vpci_vcpu { >> /* Per-vcpu structure to store state while {un}mapping of PCI BARs. */ >> - struct rangeset *mem; >> struct pci_dev *pdev; >> uint16_t cmd; >> + uint8_t num_mem_ranges; > AFAICT This could be a simple bool: > > bool map_pending : 1; > > As there's no strict need to know how many BARs have pending mappings. This is true > > Thanks, Roger. Thank you, Oleksandr
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |