[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v13.1 01/14] vpci: use per-domain PCI lock to protect vpci structure
On 2/16/24 06:44, Roger Pau Monné wrote: > On Thu, Feb 15, 2024 at 03:30:00PM -0500, Stewart Hildebrand wrote: >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >> >> Use the per-domain PCI read/write lock to protect the presence of the >> pci device vpci field. This lock can be used (and in a few cases is used >> right away) so that vpci removal can be performed while holding the lock >> in write mode. Previously such removal could race with vpci_read for >> example. >> >> When taking both d->pci_lock and pdev->vpci->lock, they should be >> taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid >> possible deadlock situations. >> >> 1. Per-domain's pci_lock is used to protect pdev->vpci structure >> from being removed. >> >> 2. Writing the command register and ROM BAR register may trigger >> modify_bars to run, which in turn may access multiple pdevs while >> checking for the existing BAR's overlap. The overlapping check, if >> done under the read lock, requires vpci->lock to be acquired on both >> devices being compared, which may produce a deadlock. It is not >> possible to upgrade read lock to write lock in such a case. So, in >> order to prevent the deadlock, use d->pci_lock in write mode instead. >> >> All other code, which doesn't lead to pdev->vpci destruction and does >> not access multiple pdevs at the same time, can still use a >> combination of the read lock and pdev->vpci->lock. >> >> 3. Drop const qualifier where the new rwlock is used and this is >> appropriate. >> >> 4. Do not call process_pending_softirqs with any locks held. For that >> unlock prior the call and re-acquire the locks after. After >> re-acquiring the lock there is no need to check if pdev->vpci exists: >> - in apply_map because of the context it is called (no race condition >> possible) >> - for MSI/MSI-X debug code because it is called at the end of >> pdev->vpci access and no further access to pdev->vpci is made >> >> 5. Use d->pci_lock around for_each_pdev and pci_get_pdev() >> while accessing pdevs in vpci code. >> >> 6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs >> do not go away. The vPCI functions call several MSI-related functions >> which already have existing non-vPCI callers. Change those MSI-related >> functions to allow using either pcidevs_lock() or d->pci_lock for >> ensuring pdevs do not go away. Holding d->pci_lock in read mode is >> sufficient. Note that this pdev protection mechanism does not protect >> other state or critical sections. These MSI-related functions already >> have other race condition and state protection mechanims (e.g. >> d->event_lock and msixtbl RCU), so we deduce that the use of the global >> pcidevs_lock() is to ensure that pdevs do not go away. >> >> 7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking >> that pdevs do not go away. The purpose of this wrapper is to aid >> readability and document the intent of the pdev protection mechanism. >> >> 8. When possible, the existing non-vPCI callers of these MSI-related >> functions haven't been switched to use the newly introduced per-domain >> pci_lock, and will continue to use the global pcidevs_lock(). This is >> done to reduce the risk of the new locking scheme introducing >> regressions. Those users will be adjusted in due time. One exception >> is where the pcidevs_lock() in allocate_and_map_msi_pirq() is moved to >> the caller, physdev_map_pirq(): this instance is switched to >> read_lock(&d->pci_lock) right away. >> >> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> >> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx> >> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx> >> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@xxxxxxx> > > A couple of questions and the pdev_list_is_read_locked() needs a small > adjustment. > >> @@ -895,6 +891,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) >> { >> unsigned int i; >> >> + /* >> + * Assert that d->pdev_list doesn't change. pdev_list_is_read_locked() >> is >> + * not suitable here because we may read_unlock(&pdev->domain->pci_lock) >> + * before returning. > > I'm confused by this comment, as I don't see why it matters that the > lock might be lock before returning. We need to ensure the lock is > taken at the time of the assert, and hence pdev_list_is_read_locked() > can be used. pdev_list_is_read_locked() currently would allow either pcidevs_lock() or d->pci_lock. If vpci_msix_arch_print() is entered with only pcidevs_lock() held, we may end up unlocking d->pci_lock when it is not locked, which would be wrong. Perhaps the comment could be clarified as: /* * Assert that d->pdev_list doesn't change. ASSERT_PDEV_LIST_IS_READ_LOCKED * is not suitable here because it may allow either pcidevs_lock() or * d->pci_lock to be held, but here we rely on d->pci_lock being held, not * pcidevs_lock(). */ > >> + */ >> + ASSERT(rw_is_locked(&msix->pdev->domain->pci_lock)); >> + ASSERT(spin_is_locked(&msix->pdev->vpci->lock)); >> + >> for ( i = 0; i < msix->max_entries; i++ ) >> { >> const struct vpci_msix_entry *entry = &msix->entries[i]; >> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h >> index aabc5465a7d3..9f31cb84c9f3 100644 >> --- a/xen/include/xen/pci.h >> +++ b/xen/include/xen/pci.h >> @@ -171,6 +171,20 @@ void pcidevs_lock(void); >> void pcidevs_unlock(void); >> bool __must_check pcidevs_locked(void); >> >> +#ifndef NDEBUG >> +/* >> + * Check for use in ASSERTs to ensure there will be no changes to the >> entries >> + * in d->pdev_list (but not the contents of each entry). >> + * This check is not suitable for protecting other state or critical >> regions. >> + */ >> +#define pdev_list_is_read_locked(d) ({ \ >> + /* NB: d may be evaluated multiple times, or not at all */ \ >> + pcidevs_locked() || (d && rw_is_locked(&d->pci_lock)); \ > > 'd' is missing parentheses here, should be (d). Thanks for spotting. I'll fix in v13.2. > >> + }) >> +#else >> +bool pdev_list_is_read_locked(const struct domain *d); >> +#endif > > FWIW, if this is only intended to be used with ASSERT, it might as > well be an ASSERT itself: > > ASSERT_PDEV_LIST_IS_READ_LOCKED(d) ... > > Don't have a strong opinion, so I'm fine with how it's used, just > noting it might be clearer if it was an ASSERT_ right away. This would also have the benefit of not relying on dead code elimination in the #else case. I'll be sending v13.2 anyway, I may as well make the change.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |