[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v7 01/12] vpci: introduce per-domain lock to protect vpci structure
Hi Roger, Roger Pau Monné <roger.pau@xxxxxxxxxx> writes: > On Thu, Jun 22, 2023 at 09:17:36PM +0000, Volodymyr Babchuk wrote: >> >> Hi Roger, >> >> Roger Pau Monné <roger.pau@xxxxxxxxxx> writes: >> >> > On Wed, Jun 21, 2023 at 10:07:20PM +0000, Volodymyr Babchuk wrote: >> >> >> >> Hi Roger, >> >> >> >> Roger Pau Monné <roger.pau@xxxxxxxxxx> writes: >> >> >> >> > On Tue, Jun 13, 2023 at 10:32:26AM +0000, Volodymyr Babchuk wrote: >> >> >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >> >> >> >> >> >> Introduce a per-domain read/write lock to check whether vpci is >> >> >> present, >> >> >> so we are sure there are no accesses to the contents of the vpci struct >> >> >> if not. This lock can be used (and in a few cases is used right away) >> >> >> so that vpci removal can be performed while holding the lock in write >> >> >> mode. Previously such removal could race with vpci_read for example. >> >> >> >> >> >> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure >> >> >> from being removed. >> >> >> >> >> >> 2. Writing the command register and ROM BAR register may trigger >> >> >> modify_bars to run, which in turn may access multiple pdevs while >> >> >> checking for the existing BAR's overlap. The overlapping check, if done >> >> >> under the read lock, requires vpci->lock to be acquired on both devices >> >> >> being compared, which may produce a deadlock. It is not possible to >> >> >> upgrade read lock to write lock in such a case. So, in order to prevent >> >> >> the deadlock, check which registers are going to be written and acquire >> >> >> the lock in the appropriate mode from the beginning. >> >> >> >> >> >> All other code, which doesn't lead to pdev->vpci destruction and does >> >> >> not >> >> >> access multiple pdevs at the same time, can still use a combination of >> >> >> the >> >> >> read lock and pdev->vpci->lock. >> >> >> >> >> >> 3. Optimize if ROM BAR write lock required detection by caching offset >> >> >> of the ROM BAR register in vpci->header->rom_reg which depends on >> >> >> header's type. >> >> >> >> >> >> 4. Reduce locked region in vpci_remove_device as it is now possible >> >> >> to set pdev->vpci to NULL early right after the write lock is acquired. >> >> >> >> >> >> 5. Reduce locked region in vpci_add_handlers as it is possible to >> >> >> initialize many more fields of the struct vpci before assigning it to >> >> >> pdev->vpci. >> >> >> >> >> >> 6. vpci_{add|remove}_register are required to be called with the write >> >> >> lock >> >> >> held, but it is not feasible to add an assert there as it requires >> >> >> struct domain to be passed for that. So, add a comment about this >> >> >> requirement >> >> >> to these and other functions with the equivalent constraints. >> >> >> >> >> >> 7. Drop const qualifier where the new rwlock is used and this is >> >> >> appropriate. >> >> >> >> >> >> 8. Do not call process_pending_softirqs with any locks held. For that >> >> >> unlock >> >> >> prior the call and re-acquire the locks after. After re-acquiring the >> >> >> lock there is no need to check if pdev->vpci exists: >> >> >> - in apply_map because of the context it is called (no race condition >> >> >> possible) >> >> >> - for MSI/MSI-X debug code because it is called at the end of >> >> >> pdev->vpci access and no further access to pdev->vpci is made >> >> >> >> >> >> 9. Check for !pdev->vpci in vpci_{read|write} after acquiring the lock >> >> >> and if so, allow reading or writing the hardware register directly. >> >> >> This is >> >> >> acceptable as we only deal with Dom0 as of now. Once DomU support is >> >> >> added the write will need to be ignored and read return all 0's for the >> >> >> guests, while Dom0 can still access the registers directly. >> >> >> >> >> >> 10. Introduce pcidevs_trylock, so there is a possibility to try locking >> >> >> the pcidev's lock. >> >> >> >> >> >> 11. Use pcidev's lock around for_each_pdev and pci_get_pdev_by_domain >> >> >> while accessing pdevs in vpci code. >> >> >> >> >> >> 12. This is based on the discussion at [1]. >> >> >> >> >> >> [1] >> >> >> https://urldefense.com/v3/__https://lore.kernel.org/all/20220204063459.680961-4-andr2000@xxxxxxxxx/__;!!GF_29dbcQIUBPA!zPy31CWFWlyC0xhEHiSj6rOPe7RDSjLranI9KZqhG4ssmChJMWvsPLJPQGTcVsnnowZpP8-LaKJkIWIzb8ue0DoYhg$ >> >> >> [lore[.]kernel[.]org] >> >> >> >> >> >> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> >> >> >> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx> >> >> >> Signed-off-by: Oleksandr Andrushchenko >> >> >> <oleksandr_andrushchenko@xxxxxxxx> >> >> > >> >> > Thanks. >> >> > >> >> > I haven't looked in full detail, but I'm afraid there's an ABBA >> >> > deadlock with the per-domain vpci lock and the pcidevs lock in >> >> > modify_bars() vs vpci_add_handlers() and vpci_remove_device(). >> >> > >> >> > I've made some comments below. >> >> >> >> Thank you for the review. I believe that it is a good idea to have a >> >> per-domain pdev_list lock. See my answers below. >> > >> > I think it's important that the lock that protects domain->pdev_list >> > must be the same that also protects pdev->vpci, or else you might run >> > into similar ABBA deadlock situations. >> > >> > The problem then could be that in vpci_{read,write} you will take the >> > per-domain pdev lock in read mode in order to get the pdev, and for >> > writes to the command register or the ROM BAR you would have to >> > upgrade such lock to a write lock without dropping it, and we don't >> > have such functionality for rw locks ATM. >> > >> > Maybe just re-starting the function knowing that the lock must be >> > taken in write mode would be a good solution: writes to the command >> > register will already be slow since they are likely to involve >> > modifications to the p2m. >> >> Looks like modify_bars() is the only cause for this extended lock. I >> know that this was discussed earlier, but can we rework modify_bars to >> not iterate over all the pdevs? We can store copy of all enabled BARs in >> a domain structure, protected by domain->vpci_lock. Something akin to >> >> struct vpci_bar { >> list_head list; >> struct vpci *vpci; >> unsigned long start; >> unsigned long end; >> bool is_rom; >> }; > > This IMO makes the logic more complicated, as each time a BAR is > updated we would have to change the cached address and size in two > different places. It's also duplicated data that takes up memory, and > there are system with a non-trivial amount of PCI devices and thus > BARs to track. > > I think it's easier to just make the newly introduced per-domain > rwlock also protect the domain's pdev_list (unless I'm missing > something). AFAICT it would also simplify locking, as such rwlock > protecting the domain->pdev_list will avoid you from having to take > the pcidevs lock in vpci_{read,write} in order to find the device the > access belongs to. I am currently implementing your proposal (along with Jan's suggestions), but I am facing ABBA deadlock with IOMMU's reassign_device() call, which has this piece of code: list_move(&pdev->domain_list, &target->pdev_list); My immediate change was: write_lock(&pdev->domain->pci_lock); list_del(&pdev->domain_list); write_unlock(&pdev->domain->pci_lock); write_lock(&target->pci_lock); list_add(&pdev->domain_list, &target->pdev_list); write_unlock(&target->pci_lock); But this will not work because reassign_device is called from pci_release_devices() which iterates over d->pdev_list, so we need to take a d->pci_lock early. Any suggestions on how to fix this? My idea is to remove a device from a list one at time: int pci_release_devices(struct domain *d) { struct pci_dev *pdev; u8 bus, devfn; int ret; pcidevs_lock(); write_lock(&d->pci_lock); ret = arch_pci_clean_pirqs(d); if ( ret ) { pcidevs_unlock(); write_unlock(&d->pci_lock); return ret; } while ( !list_empty(&d->pdev_list) ) { pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list); bus = pdev->bus; devfn = pdev->devfn; list_del(&pdev->domain_list); write_unlock(&d->pci_lock); ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret; write_lock(&d->pci_lock); } write_unlock(&d->pci_lock); pcidevs_unlock(); return ret; } But it is ugly somewhat. -- WBR, Volodymyr
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |