[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v13.3 01/14] vpci: use per-domain PCI lock to protect vpci structure
On 21.02.2024 03:45, Stewart Hildebrand wrote: > From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> > > Use the per-domain PCI read/write lock to protect the presence of the > pci device vpci field. This lock can be used (and in a few cases is used > right away) so that vpci removal can be performed while holding the lock > in write mode. Previously such removal could race with vpci_read for > example. > > When taking both d->pci_lock and pdev->vpci->lock, they should be > taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid > possible deadlock situations. > > 1. Per-domain's pci_lock is used to protect pdev->vpci structure > from being removed. > > 2. Writing the command register and ROM BAR register may trigger > modify_bars to run, which in turn may access multiple pdevs while > checking for the existing BAR's overlap. The overlapping check, if > done under the read lock, requires vpci->lock to be acquired on both > devices being compared, which may produce a deadlock. It is not > possible to upgrade read lock to write lock in such a case. So, in > order to prevent the deadlock, use d->pci_lock in write mode instead. > > All other code, which doesn't lead to pdev->vpci destruction and does > not access multiple pdevs at the same time, can still use a > combination of the read lock and pdev->vpci->lock. > > 3. Drop const qualifier where the new rwlock is used and this is > appropriate. > > 4. Do not call process_pending_softirqs with any locks held. For that > unlock prior the call and re-acquire the locks after. After > re-acquiring the lock there is no need to check if pdev->vpci exists: > - in apply_map because of the context it is called (no race condition > possible) > - for MSI/MSI-X debug code because it is called at the end of > pdev->vpci access and no further access to pdev->vpci is made > > 5. Use d->pci_lock around for_each_pdev and pci_get_pdev() > while accessing pdevs in vpci code. > > 6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs > do not go away. The vPCI functions call several MSI-related functions > which already have existing non-vPCI callers. Change those MSI-related > functions to allow using either pcidevs_lock() or d->pci_lock for > ensuring pdevs do not go away. Holding d->pci_lock in read mode is > sufficient. Note that this pdev protection mechanism does not protect > other state or critical sections. These MSI-related functions already > have other race condition and state protection mechanims (e.g. > d->event_lock and msixtbl RCU), so we deduce that the use of the global > pcidevs_lock() is to ensure that pdevs do not go away. > > 7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking > that pdevs do not go away. The purpose of this wrapper is to aid > readability and document the intent of the pdev protection mechanism. > > 8. When possible, the existing non-vPCI callers of these MSI-related > functions haven't been switched to use the newly introduced per-domain > pci_lock, and will continue to use the global pcidevs_lock(). This is > done to reduce the risk of the new locking scheme introducing > regressions. Those users will be adjusted in due time. One exception > is where the pcidevs_lock() in allocate_and_map_msi_pirq() is moved to > the caller, physdev_map_pirq(): this instance is switched to > read_lock(&d->pci_lock) right away. > > Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > Suggested-by: Jan Beulich <jbeulich@xxxxxxxx> > Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> > Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx> > Signed-off-by: Stewart Hildebrand <stewart.hildebrand@xxxxxxx> Acked-by: Jan Beulich <jbeulich@xxxxxxxx> with two small remaining remarks (below) and on the assumption that an R-b from Roger in particular for the vPCI code is going to turn up eventually. > @@ -895,6 +891,15 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) > { > unsigned int i; > > + /* > + * Assert that pdev_list doesn't change. ASSERT_PDEV_LIST_IS_READ_LOCKED > + * is not suitable here because it may allow either pcidevs_lock() or > + * pci_lock to be held, but here we rely on pci_lock being held, not > + * pcidevs_lock(). > + */ > + ASSERT(rw_is_locked(&msix->pdev->domain->pci_lock)); > + ASSERT(spin_is_locked(&msix->pdev->vpci->lock)); As to the comment, I think it's not really "may". I also think referral to ... > @@ -913,13 +918,23 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) > struct pci_dev *pdev = msix->pdev; > > spin_unlock(&msix->pdev->vpci->lock); > + read_unlock(&pdev->domain->pci_lock); > process_pending_softirqs(); > + > + if ( !read_trylock(&pdev->domain->pci_lock) ) > + return -EBUSY; > + > /* NB: we assume that pdev cannot go away for an alive domain. */ > if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) ) > + { > + read_unlock(&pdev->domain->pci_lock); > return -EBUSY; > + } > + > if ( pdev->vpci->msix != msix ) > { > spin_unlock(&pdev->vpci->lock); > + read_unlock(&pdev->domain->pci_lock); > return -EAGAIN; > } > } ... this machinery would be quite helpful (and iirc you even had such in an earlier version). > @@ -313,17 +316,31 @@ void vpci_dump_msi(void) > { > /* > * On error vpci_msix_arch_print will always return > without > - * holding the lock. > + * holding the locks. > */ > printk("unable to print all MSI-X entries: %d\n", rc); > - process_pending_softirqs(); > - continue; > + goto pdev_done; > } > } > > + /* > + * Unlock locks to process pending softirqs. This is > + * potentially unsafe, as d->pdev_list can be changed in > + * meantime. > + */ > spin_unlock(&pdev->vpci->lock); > + read_unlock(&d->pci_lock); > + pdev_done: > process_pending_softirqs(); > + if ( !read_trylock(&d->pci_lock) ) > + { > + printk("unable to access other devices for the domain\n"); > + goto domain_done; > + } > } > + read_unlock(&d->pci_lock); > + domain_done: > + ; I think a blank line ahead of this label and perhaps also ahead of "pdev_done" would be quite nice. I guess respective adjustments could be done while committing, provided there's not going to be any other reason for yet another revision. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |