[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v7 01/12] vpci: introduce per-domain lock to protect vpci structure
Hi Jan, Jan Beulich <jbeulich@xxxxxxxx> writes: > On 05.07.2023 10:59, Roger Pau Monné wrote: >> On Wed, Jul 05, 2023 at 09:11:10AM +0200, Jan Beulich wrote: >>> On 04.07.2023 23:03, Volodymyr Babchuk wrote: >>>> I am currently implementing your proposal (along with Jan's >>>> suggestions), but I am facing ABBA deadlock with IOMMU's >>>> reassign_device() call, which has this piece of code: >>>> >>>> list_move(&pdev->domain_list, &target->pdev_list); >>>> >>>> My immediate change was: >>>> >>>> write_lock(&pdev->domain->pci_lock); >>>> list_del(&pdev->domain_list); >>>> write_unlock(&pdev->domain->pci_lock); >>>> >>>> write_lock(&target->pci_lock); >>>> list_add(&pdev->domain_list, &target->pdev_list); >>>> write_unlock(&target->pci_lock); >>>> >>>> But this will not work because reassign_device is called from >>>> pci_release_devices() which iterates over d->pdev_list, so we need to >>>> take a d->pci_lock early. >>>> >>>> Any suggestions on how to fix this? My idea is to remove a device from a >>>> list one at time: >>>> >>>> int pci_release_devices(struct domain *d) >>>> { >>>> struct pci_dev *pdev; >>>> u8 bus, devfn; >>>> int ret; >>>> >>>> pcidevs_lock(); >>>> write_lock(&d->pci_lock); >>>> ret = arch_pci_clean_pirqs(d); >>>> if ( ret ) >>>> { >>>> pcidevs_unlock(); >>>> write_unlock(&d->pci_lock); >>>> return ret; >>>> } >>>> >>>> while ( !list_empty(&d->pdev_list) ) >>>> { >>>> pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list); >>>> bus = pdev->bus; >>>> devfn = pdev->devfn; >>>> list_del(&pdev->domain_list); >>>> write_unlock(&d->pci_lock); >>>> ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret; >>>> write_lock(&d->pci_lock); >>> >>> I think it needs doing almost like this, but with two more tweaks and >>> no list_del() right here (first and foremost to avoid needing to >>> figure whether removing early isn't going to subtly break anything; >>> see below for an error case that would end up with changed behavior): >>> >>> while ( !list_empty(&d->pdev_list) ) >>> { >>> const struct pci_dev *pdev = list_first_entry(&d->pdev_list, struct >>> pci_dev, domain_list); >>> uint16_t seg = pdev->seg; >>> uint8_t bus = pdev->bus; >>> uint8_t devfn = pdev->devfn; >>> >>> write_unlock(&d->pci_lock); >> >> I think you need to remove the device from the pdev_list before >> dropping the lock, or else release could race with other operations. >> >> That's unlikely, but still if the lock is dropped and the routine >> needs to operate on the device it is better remove such device from >> the domain so other operations cannot get a reference to it. >> >> Otherwise you could modify reassign_device() implementations so they >> require the caller to hold the source->pci_lock when calling the >> routine, but that's ugly because the lock would need to be dropped in >> order to reassign the device from source to target domains. >> >> Another option would be to move the whole d->pdev_list to a local >> variable (so that d->pdev_list would be empty) and then iterate over >> it without the d->pci_lock. On failure you would take the lock and >> add the failing device back into d->pdev_list. > > Conceptually I like this last variant, but like the individual list_del() > it requires auditing code for no dependency on the device still being on > that list. In fact deassign_device()'s use of pci_get_pdev() does. The > function would then need changing to have struct pci_dev * passed in. > Yet who knows where else there are uses of pci_get_pdev() lurking. Okay, so I changed deassign_device() signature and reworked the loop in pci_release_devices() in a such way: INIT_LIST_HEAD(&tmp_list); /* Move all entries to tmp_list, so we can drop d->pci_lock */ list_splice_init(&d->pdev_list, &tmp_list); write_unlock(&d->pci_lock); list_for_each_entry_safe ( pdev, tmp, &tmp_list, domain_list ) { pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list); rc = deassign_device(d, pdev); if ( rc ) { /* Return device back to the domain list */ write_lock(&d->pci_lock); list_add(&pdev->domain_list, &d->pdev_list); write_unlock(&d->pci_lock); func_ret = rc; } } Also, I checked for all pci_get_pdev() calls and found that struct domain (the first parameter) is passed only in handful of places: *** xen/drivers/vpci/vpci.c: vpci_read[504] pdev = pci_get_pdev(d, sbdf); vpci_read[506] pdev = pci_get_pdev(dom_xen, sbdf); vpci_write[621] pdev = pci_get_pdev(d, sbdf); vpci_write[623] pdev = pci_get_pdev(dom_xen, sbdf); *** xen/arch/x86/irq.c: map_domain_pirq[2166] pdev = pci_get_pdev(d, msi->sbdf); *** xen/drivers/passthrough/pci.c: XEN_GUEST_HANDLE_PARAM[1712] pdev = pci_get_pdev(d, machine_sbdf); The last one is due to my change to deassign_device() signature. ============================== d->pdev_list can be accessed there: *** xen/drivers/passthrough/amd/pci_amd_iommu.c: reassign_device[489] list_add(&pdev->domain_list, &target->pdev_list); *** xen/drivers/passthrough/pci.c: _pci_hide_device[463] list_add(&pdev->domain_list, &dom_xen->pdev_list); pci_get_pdev[561] list_for_each_entry ( pdev, &d->pdev_list, domain_list ) pci_add_device[759] list_add(&pdev->domain_list, &hardware_domain->pdev_list); pci_release_devices[917] list_splice_init(&d->pdev_list, &tmp_list); pci_release_devices[922] pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list); pci_release_devices[928] list_add(&pdev->domain_list, &d->pdev_list); _setup_hwdom_pci_devices[1155] list_add(&pdev->domain_list, &ctxt->d->pdev_list); *** xen/drivers/passthrough/vtd/iommu.c: reassign_device_ownership[2819] list_add(&pdev->domain_list, &target->pdev_list); *** xen/include/xen/pci.h: for_each_pdev[149] list_for_each_entry(pdev, &(domain)->pdev_list, domain_list) has_arch_pdevs[151] #define has_arch_pdevs(d) (!list_empty(&(d)->pdev_list)) ============================== And has_arch_pdevs() is used there: *** xen/arch/x86/hvm/hvm.c: hvm_set_cr0[2388] has_arch_pdevs(d)) ) *** xen/arch/x86/hvm/vmx/vmcs.c: vmx_do_resume[1892] if ( has_arch_pdevs(v->domain) && !iommu_snoop *** xen/arch/x86/mm.c: l1_disallow_mask[172] !has_arch_pdevs(d) && \ *** xen/arch/x86/mm/p2m-pod.c: p2m_pod_set_mem_target[352] if ( has_arch_pdevs(d) || cache_flush_permitted(d) ) guest_physmap_mark_populate_on_demand[1404] if ( has_arch_pdevs(d) || cache_flush_permitted(d) ) *** xen/arch/x86/mm/paging.c: paging_log_dirty_enable[208] if ( has_arch_pdevs(d) ) *** xen/drivers/passthrough/vtd/iommu.c: reassign_device_ownership[2773] if ( !has_arch_pdevs(target) ) reassign_device_ownership[2807] if ( !has_arch_pdevs(target) ) reassign_device_ownership[2825] if ( !has_arch_pdevs(source) ) has_arch_pdevs() bothers me most, actually, because it is not always obvious how to add locking for the callers. I am planning to rework it in the following way: #define has_arch_pdevs_unlocked(d) (!list_empty(&(d)->pdev_list)) static inline bool has_arch_pdevs(struct domain *d) { bool ret; read_lock(&d->pci_lock); ret = has_arch_pdevs_unlocked(d); read_unlock(&d->pci_lock); return ret; } And then use appropriate macro/function depending on circumstances. -- WBR, Volodymyr
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |