[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH 01/10] xen: pci: add per-domain pci list lock
On 15.02.2023 00:38, Volodymyr Babchuk wrote: > Stefano Stabellini <sstabellini@xxxxxxxxxx> writes: >> On Wed, 31 Aug 2022, Volodymyr Babchuk wrote: >> 1) iommu.c:reassign_device_ownership and pci_amd_iommu.c:reassign_device >> Both functions without any pdevs_lock locking do: >> list_move(&pdev->domain_list, &target->pdev_list); >> >> It seems to be it would need pdevs_lock. Maybe we need to change >> list_move into list_del (protected by the pdevs_lock of the old domain) >> and list_add (protected by the pdev_lock of the new domain). > > Yes, I did as you suggested. But this leads to another potential > issue. I'll describe it below, in deassign_device() part. > > [...] > >>> + spin_lock(&d->pdevs_lock); >>> list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list ) >>> { >>> bus = pdev->bus; >>> devfn = pdev->devfn; >>> ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret; >> >> This causes pdevs_lock to be taken twice. deassign_device also takes a >> pdevs_lock. Probably we need to change all the >> spin_lock(&d->pdevs_lock) into spin_lock_recursive. > > You are right, I missed that deassign_device() causes > iommu*_reassign_device() call. But there lies the issue: with recursive > locks, reassign_device() will not be able to unlock source->pdevs_lock, > but will try to take target->pdevs_lock also. This potentially might > lead to deadlock, if another call to reassign_device() moves some other > pdev in the opposite way at the same time. This is why I want to avoid > recursive spinlocks if possible. > > So, I am thinking: why does IOMMU code move a pdev across domains in the > first place? We are making IOMMU driver responsible of managing domain's > pdevs, which does not look right, as this is the responsibility of pci.c The boundary between what is PCI and what is IOMMU is pretty fuzzy, I'm afraid. After all pass-through (with IOMMU) is all about PCI devices (on x86 at least). Despite the filename being pci.c, much what's there is actually IOMMU code. Specifically deassign_device() is the vendor- independent IOMMU part of the operation; moving that ... > I want to propose another approach: implement deassign_device() function > in IOMMU drivers. ... into vendor-specific code would mean code duplication, which ought to be avoided as much as possible. > Then, instead of calling to reassign_device() we might > do the following: > > 1. deassign_device() > 2. remove pdev from source->pdev_list > 3. add pdef to target->pdev_list > 4. assign_device() I'm not sure such ordering would end up correct. It may need to be 1. remove pdev from source->pdev_list 2. deassign_device() 3. assign_device() 4. add pdev to target->pdev_list (or back to source upon failure) which still may be troublesome: The present placement of the moving is, in particular, ensuring that ownership (expressed by pdev->domain) changes at the same time as list membership. With things properly locked it _may_ be okay to separate (in time) these two steps, but that'll require quite a bit of care (read: justification that it's correct/safe). And of course you could then also ask why it's low level driver code changing pdev->domain. I don't see how you would move that to generic code, as the field shouldn't be left stale for an extended period of time, nor can it sensibly be transiently set to e.g. NULL. Additionally deassign_device() is misnamed - it's really re-assigning the device (as you can see from it invoking hd->platform_ops->reassign_device()). You cannot split de-assign from assign: The device always should be assigned _somewhere_ (DOM_IO if nothing else; see XSA-400), except late enough during hot-unplug. But that would be taken care of by the alternative ordering above (combining 2 and 3 into a single step). Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |