[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH] pci: introduce per-domain PCI rwlock
On 11.07.2023 02:46, Volodymyr Babchuk wrote: > Add per-domain d->pci_lock that protects access to > d->pdev_list. Purpose of this lock is to give guarantees to VPCI code > that underlying pdev will not disappear under feet. Later it will also > protect pdev->vpci structure and pdev access in modify_bars(). > > Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > Suggested-by: Jan Beulich <jbeulich@xxxxxxxx> > Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx> > > --- > > This patch should be part of VPCI series, but I am posting it as a > sinle-patch RFC to discuss changes to x86 MM and IOMMU code. To aid review / judgement extending the commit message would help, to outline around which function invocations (and for what reason) the lock now needs to be held. Furthermore lock nesting rules want writing down (perhaps next to the declaration of the lock). Therefore comments below are merely preliminary and likely incomplete. > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -2381,12 +2381,14 @@ int hvm_set_cr0(unsigned long value, bool may_defer) > } > } > > + read_lock(&d->pci_lock); > if ( ((value ^ old_value) & X86_CR0_CD) && > is_iommu_enabled(d) && hvm_funcs.handle_cd && > (!rangeset_is_empty(d->iomem_caps) || > !rangeset_is_empty(d->arch.ioport_caps) || > has_arch_pdevs(d)) ) > alternative_vcall(hvm_funcs.handle_cd, v, value); > + read_unlock(&d->pci_lock); handle_cd() is non-trivial - did you you audit it for safety of holding a lock around it? > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -858,12 +858,15 @@ get_page_from_l1e( > return 0; > } > > + read_lock(&l1e_owner->pci_lock); > if ( unlikely(l1f & l1_disallow_mask(l1e_owner)) ) > { > gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n", > l1f & l1_disallow_mask(l1e_owner)); > + read_unlock(&l1e_owner->pci_lock); In cases like this one I think you want to avoid holding the lock across the printk(). This can easily be arranged for by latching l1_disallow_mask()'s return value into a new local variable. > --- a/xen/arch/x86/mm/p2m-pod.c > +++ b/xen/arch/x86/mm/p2m-pod.c > @@ -349,10 +349,12 @@ p2m_pod_set_mem_target(struct domain *d, unsigned long > target) > > ASSERT( pod_target >= p2m->pod.count ); > > + read_lock(&d->pci_lock); > if ( has_arch_pdevs(d) || cache_flush_permitted(d) ) > ret = -ENOTEMPTY; > else > ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/); > + read_unlock(&d->pci_lock); Hmm, is it necessary to hold the lock across the function call? > --- a/xen/arch/x86/mm/paging.c > +++ b/xen/arch/x86/mm/paging.c > @@ -205,21 +205,27 @@ static int paging_log_dirty_enable(struct domain *d) > { > int ret; > > + read_lock(&d->pci_lock); > if ( has_arch_pdevs(d) ) > { > /* > * Refuse to turn on global log-dirty mode > * if the domain is sharing the P2M with the IOMMU. > */ > + read_unlock(&d->pci_lock); > return -EINVAL; > } > > if ( paging_mode_log_dirty(d) ) > + { > + read_unlock(&d->pci_lock); > return -EINVAL; > + } > > domain_pause(d); > ret = d->arch.paging.log_dirty.ops->enable(d); > domain_unpause(d); > + read_unlock(&d->pci_lock); This means a relatively long potential lock holding time. I wonder whether lock release shouldn't be delegated to the ->enable() hook, as it could do so immediately after setting the flag that would then prevent assignment of devices. > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c > @@ -102,6 +102,8 @@ static bool any_pdev_behind_iommu(const struct domain *d, > { > const struct pci_dev *pdev; > > + ASSERT(rw_is_locked(&d->pci_lock)); > + > for_each_pdev ( d, pdev ) > { > if ( pdev == exclude ) > @@ -467,17 +469,24 @@ static int cf_check reassign_device( > > if ( !QUARANTINE_SKIP(target, pdev) ) > { > + read_lock(&target->pci_lock); > rc = amd_iommu_setup_domain_device(target, iommu, devfn, pdev); > if ( rc ) > return rc; > + read_unlock(&target->pci_lock); You need to drop the lock before the if(). Also nit: No hard tabs here please. > } > else > amd_iommu_disable_domain_device(source, iommu, devfn, pdev); Related to my initial comment at the top: It wants clarifying for example why "setup" needs to lock held, but "disable" doesn't. > if ( devfn == pdev->devfn && pdev->domain != target ) > { > - list_move(&pdev->domain_list, &target->pdev_list); > - pdev->domain = target; > + write_lock(&pdev->domain->pci_lock); Shorter as write_lock(&source->pci_lock)? (Also in the VT-d counterpart then.) > @@ -748,7 +750,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, > if ( !pdev->domain ) > { > pdev->domain = hardware_domain; > + write_lock(&hardware_domain->pci_lock); > list_add(&pdev->domain_list, &hardware_domain->pdev_list); > + write_unlock(&hardware_domain->pci_lock); What about the companion pci_remove_device()? > @@ -887,26 +891,62 @@ static int deassign_device(struct domain *d, uint16_t > seg, uint8_t bus, > > int pci_release_devices(struct domain *d) > { > - struct pci_dev *pdev, *tmp; > - u8 bus, devfn; > - int ret; > + int combined_ret; > + LIST_HEAD(failed_pdevs); > > pcidevs_lock(); > - ret = arch_pci_clean_pirqs(d); > - if ( ret ) > + write_lock(&d->pci_lock); > + combined_ret = arch_pci_clean_pirqs(d); > + if ( combined_ret ) > { > pcidevs_unlock(); > - return ret; > + write_unlock(&d->pci_lock); > + return combined_ret; > } > - list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list ) > + > + while ( !list_empty(&d->pdev_list) ) > { > - bus = pdev->bus; > - devfn = pdev->devfn; > - ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret; > + struct pci_dev *pdev = list_first_entry(&d->pdev_list, > + struct pci_dev, > + domain_list); > + uint16_t seg = pdev->seg; > + uint8_t bus = pdev->bus; > + uint8_t devfn = pdev->devfn; > + int ret; > + > + write_unlock(&d->pci_lock); > + ret = deassign_device(d, seg, bus, devfn); > + write_lock(&d->pci_lock); > + if ( ret ) > + { > + bool still_present = false; > + const struct pci_dev *tmp; > + > + /* > + * We need to check if deassign_device() left our pdev in > + * domain's list. As we dropped the lock, we can't be sure > + * that list wasn't permutated in some random way, so we > + * need to traverse the whole list. > + */ > + for_each_pdev ( d, tmp ) > + { > + if ( tmp == pdev ) > + { > + still_present = true; > + break; > + } > + } > + if ( still_present ) > + list_move(&pdev->domain_list, &failed_pdevs); > + combined_ret = ret; Elsewhere we aim at returning the first error that was encountered, not the last one. > @@ -2765,6 +2767,7 @@ static int cf_check reassign_device_ownership( > > if ( !QUARANTINE_SKIP(target, pdev->arch.vtd.pgd_maddr) ) > { > + read_lock(&target->pci_lock); > if ( !has_arch_pdevs(target) ) > vmx_pi_hooks_assign(target); I'm afraid this and the unhook side locking isn't sufficient to guarantee no races. Things still depend on the domctl and/or pcidevs lock being held around this. As which points acquiring the lock here (and below) is of questionable value. In any event I think this warrants code comments. Possibly the same also applies to check_cleanup_domid_map() and friends. > @@ -2780,21 +2783,26 @@ static int cf_check reassign_device_ownership( > #endif > > ret = domain_context_mapping(target, devfn, pdev); > + read_unlock(&target->pci_lock); Other calls to domain_context_mapping() aren't wrapped like this. Same for domain_context_unmap(), wrapped exactly once below. > if ( !ret && pdev->devfn == devfn && > !QUARANTINE_SKIP(source, pdev->arch.vtd.pgd_maddr) ) > { > const struct acpi_drhd_unit *drhd = > acpi_find_matched_drhd_unit(pdev); > > + read_lock(&source->pci_lock); > if ( drhd ) > check_cleanup_domid_map(source, pdev, drhd->iommu); > + read_unlock(&source->pci_lock); Acquiring the lock inside the if() ought to suffice here. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |