[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v13.1 01/14] vpci: use per-domain PCI lock to protect vpci structure



On Fri, Feb 16, 2024 at 09:41:19AM -0500, Stewart Hildebrand wrote:
> On 2/16/24 06:44, Roger Pau Monné wrote:
> > On Thu, Feb 15, 2024 at 03:30:00PM -0500, Stewart Hildebrand wrote:
> >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
> >>
> >> Use the per-domain PCI read/write lock to protect the presence of the
> >> pci device vpci field. This lock can be used (and in a few cases is used
> >> right away) so that vpci removal can be performed while holding the lock
> >> in write mode. Previously such removal could race with vpci_read for
> >> example.
> >>
> >> When taking both d->pci_lock and pdev->vpci->lock, they should be
> >> taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid
> >> possible deadlock situations.
> >>
> >> 1. Per-domain's pci_lock is used to protect pdev->vpci structure
> >> from being removed.
> >>
> >> 2. Writing the command register and ROM BAR register may trigger
> >> modify_bars to run, which in turn may access multiple pdevs while
> >> checking for the existing BAR's overlap. The overlapping check, if
> >> done under the read lock, requires vpci->lock to be acquired on both
> >> devices being compared, which may produce a deadlock. It is not
> >> possible to upgrade read lock to write lock in such a case. So, in
> >> order to prevent the deadlock, use d->pci_lock in write mode instead.
> >>
> >> All other code, which doesn't lead to pdev->vpci destruction and does
> >> not access multiple pdevs at the same time, can still use a
> >> combination of the read lock and pdev->vpci->lock.
> >>
> >> 3. Drop const qualifier where the new rwlock is used and this is
> >> appropriate.
> >>
> >> 4. Do not call process_pending_softirqs with any locks held. For that
> >> unlock prior the call and re-acquire the locks after. After
> >> re-acquiring the lock there is no need to check if pdev->vpci exists:
> >>  - in apply_map because of the context it is called (no race condition
> >>    possible)
> >>  - for MSI/MSI-X debug code because it is called at the end of
> >>    pdev->vpci access and no further access to pdev->vpci is made
> >>
> >> 5. Use d->pci_lock around for_each_pdev and pci_get_pdev()
> >> while accessing pdevs in vpci code.
> >>
> >> 6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs
> >> do not go away. The vPCI functions call several MSI-related functions
> >> which already have existing non-vPCI callers. Change those MSI-related
> >> functions to allow using either pcidevs_lock() or d->pci_lock for
> >> ensuring pdevs do not go away. Holding d->pci_lock in read mode is
> >> sufficient. Note that this pdev protection mechanism does not protect
> >> other state or critical sections. These MSI-related functions already
> >> have other race condition and state protection mechanims (e.g.
> >> d->event_lock and msixtbl RCU), so we deduce that the use of the global
> >> pcidevs_lock() is to ensure that pdevs do not go away.
> >>
> >> 7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking
> >> that pdevs do not go away. The purpose of this wrapper is to aid
> >> readability and document the intent of the pdev protection mechanism.
> >>
> >> 8. When possible, the existing non-vPCI callers of these MSI-related
> >> functions haven't been switched to use the newly introduced per-domain
> >> pci_lock, and will continue to use the global pcidevs_lock(). This is
> >> done to reduce the risk of the new locking scheme introducing
> >> regressions. Those users will be adjusted in due time. One exception
> >> is where the pcidevs_lock() in allocate_and_map_msi_pirq() is moved to
> >> the caller, physdev_map_pirq(): this instance is switched to
> >> read_lock(&d->pci_lock) right away.
> >>
> >> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> >> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
> >> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
> >> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
> >> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
> > 
> > A couple of questions and the pdev_list_is_read_locked() needs a small
> > adjustment.
> > 
> >> @@ -895,6 +891,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
> >>  {
> >>      unsigned int i;
> >>  
> >> +    /*
> >> +     * Assert that d->pdev_list doesn't change. 
> >> pdev_list_is_read_locked() is
> >> +     * not suitable here because we may 
> >> read_unlock(&pdev->domain->pci_lock)
> >> +     * before returning.
> > 
> > I'm confused by this comment, as I don't see why it matters that the
> > lock might be lock before returning.  We need to ensure the lock is
> > taken at the time of the assert, and hence pdev_list_is_read_locked()
> > can be used.
> 
> pdev_list_is_read_locked() currently would allow either pcidevs_lock()
> or d->pci_lock. If vpci_msix_arch_print() is entered with only
> pcidevs_lock() held, we may end up unlocking d->pci_lock when it is
> not locked, which would be wrong.
> 
> Perhaps the comment could be clarified as:
> 
>     /*
>      * Assert that d->pdev_list doesn't change. 
> ASSERT_PDEV_LIST_IS_READ_LOCKED
>      * is not suitable here because it may allow either pcidevs_lock() or
>      * d->pci_lock to be held, but here we rely on d->pci_lock being held, not
>      * pcidevs_lock().
>      */

Yes, this is indeed better.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.