[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v13.1 01/14] vpci: use per-domain PCI lock to protect vpci structure


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
  • Date: Fri, 16 Feb 2024 09:41:19 -0500
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=citrix.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zVpZ5CiPG75/pzA0IJfoQQfFkGBTRxHvZPoM3ewu2mo=; b=NbeFmJCAjBpFHOWduPtL00iO1ptykzeMCGom34quX1e/pAQ7f3HREsKBFWUcbOZzvwOZSHwjwIhepkCllgCLElbYV/GV77GGf070EA0OzUnYJCDAsdPFOdWrAepgysfNYNCtkFQOaL1eZPweUlmYH7vN9SZhIxSROZGu7C2lfrI5Ml7PEnGmLmtG4yhx65Bx0OzrIctUewv/ARajsM620ehFh+PmbvYrrPEXSzsEn6ENwP8Jo+qvoMa8/YyZBfgFYfcRvry0p+XFl6qFRKV2oTq6hu3OZSPeHcg3JdzyxT1+K20Z9wFdfc/LYO2pOIv33lyAPDvVRFTT1btxaIdbnA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OQg5BSsMemhu6tbcT2LSqb5IouiXToJerTbdsDAuDa1QAy//5qG30r5aHsUFX6vC0HGW2ao7MrWDOmsZ/wsttDG4IdOLvgFqLuzp9xlFjhIGCy0LIDFDvJ20PfQ+p0Ca4qIyIWcIkYAWOdCKskJT0+S6/ZTgwQONzBHzbR2/D/AUUCSUm5TiXzRTM4i7lDfHni0mqsn1Sh+iNBU/WRHp3RNK05GnV/v4qioW5NsTBTfjRjurf0xswxxhlT2i2PEbSRI/fMZoaPKxVu09XmFQxXtyShe5RzltXJbi4x2fm8kp4N/ZRaw0iXWsWfELu1nkZNfcE6Ba0+Hy5PKpm+/u7g==
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
  • Delivery-date: Fri, 16 Feb 2024 14:41:42 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 2/16/24 06:44, Roger Pau Monné wrote:
> On Thu, Feb 15, 2024 at 03:30:00PM -0500, Stewart Hildebrand wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>
>> Use the per-domain PCI read/write lock to protect the presence of the
>> pci device vpci field. This lock can be used (and in a few cases is used
>> right away) so that vpci removal can be performed while holding the lock
>> in write mode. Previously such removal could race with vpci_read for
>> example.
>>
>> When taking both d->pci_lock and pdev->vpci->lock, they should be
>> taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid
>> possible deadlock situations.
>>
>> 1. Per-domain's pci_lock is used to protect pdev->vpci structure
>> from being removed.
>>
>> 2. Writing the command register and ROM BAR register may trigger
>> modify_bars to run, which in turn may access multiple pdevs while
>> checking for the existing BAR's overlap. The overlapping check, if
>> done under the read lock, requires vpci->lock to be acquired on both
>> devices being compared, which may produce a deadlock. It is not
>> possible to upgrade read lock to write lock in such a case. So, in
>> order to prevent the deadlock, use d->pci_lock in write mode instead.
>>
>> All other code, which doesn't lead to pdev->vpci destruction and does
>> not access multiple pdevs at the same time, can still use a
>> combination of the read lock and pdev->vpci->lock.
>>
>> 3. Drop const qualifier where the new rwlock is used and this is
>> appropriate.
>>
>> 4. Do not call process_pending_softirqs with any locks held. For that
>> unlock prior the call and re-acquire the locks after. After
>> re-acquiring the lock there is no need to check if pdev->vpci exists:
>>  - in apply_map because of the context it is called (no race condition
>>    possible)
>>  - for MSI/MSI-X debug code because it is called at the end of
>>    pdev->vpci access and no further access to pdev->vpci is made
>>
>> 5. Use d->pci_lock around for_each_pdev and pci_get_pdev()
>> while accessing pdevs in vpci code.
>>
>> 6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs
>> do not go away. The vPCI functions call several MSI-related functions
>> which already have existing non-vPCI callers. Change those MSI-related
>> functions to allow using either pcidevs_lock() or d->pci_lock for
>> ensuring pdevs do not go away. Holding d->pci_lock in read mode is
>> sufficient. Note that this pdev protection mechanism does not protect
>> other state or critical sections. These MSI-related functions already
>> have other race condition and state protection mechanims (e.g.
>> d->event_lock and msixtbl RCU), so we deduce that the use of the global
>> pcidevs_lock() is to ensure that pdevs do not go away.
>>
>> 7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking
>> that pdevs do not go away. The purpose of this wrapper is to aid
>> readability and document the intent of the pdev protection mechanism.
>>
>> 8. When possible, the existing non-vPCI callers of these MSI-related
>> functions haven't been switched to use the newly introduced per-domain
>> pci_lock, and will continue to use the global pcidevs_lock(). This is
>> done to reduce the risk of the new locking scheme introducing
>> regressions. Those users will be adjusted in due time. One exception
>> is where the pcidevs_lock() in allocate_and_map_msi_pirq() is moved to
>> the caller, physdev_map_pirq(): this instance is switched to
>> read_lock(&d->pci_lock) right away.
>>
>> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
> 
> A couple of questions and the pdev_list_is_read_locked() needs a small
> adjustment.
> 
>> @@ -895,6 +891,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>>  {
>>      unsigned int i;
>>  
>> +    /*
>> +     * Assert that d->pdev_list doesn't change. pdev_list_is_read_locked() 
>> is
>> +     * not suitable here because we may read_unlock(&pdev->domain->pci_lock)
>> +     * before returning.
> 
> I'm confused by this comment, as I don't see why it matters that the
> lock might be lock before returning.  We need to ensure the lock is
> taken at the time of the assert, and hence pdev_list_is_read_locked()
> can be used.

pdev_list_is_read_locked() currently would allow either pcidevs_lock()
or d->pci_lock. If vpci_msix_arch_print() is entered with only
pcidevs_lock() held, we may end up unlocking d->pci_lock when it is
not locked, which would be wrong.

Perhaps the comment could be clarified as:

    /*
     * Assert that d->pdev_list doesn't change. ASSERT_PDEV_LIST_IS_READ_LOCKED
     * is not suitable here because it may allow either pcidevs_lock() or
     * d->pci_lock to be held, but here we rely on d->pci_lock being held, not
     * pcidevs_lock().
     */

> 
>> +     */
>> +    ASSERT(rw_is_locked(&msix->pdev->domain->pci_lock));
>> +    ASSERT(spin_is_locked(&msix->pdev->vpci->lock));
>> +
>>      for ( i = 0; i < msix->max_entries; i++ )
>>      {
>>          const struct vpci_msix_entry *entry = &msix->entries[i];
>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>> index aabc5465a7d3..9f31cb84c9f3 100644
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -171,6 +171,20 @@ void pcidevs_lock(void);
>>  void pcidevs_unlock(void);
>>  bool __must_check pcidevs_locked(void);
>>  
>> +#ifndef NDEBUG
>> +/*
>> + * Check for use in ASSERTs to ensure there will be no changes to the 
>> entries
>> + * in d->pdev_list (but not the contents of each entry).
>> + * This check is not suitable for protecting other state or critical 
>> regions.
>> + */
>> +#define pdev_list_is_read_locked(d) ({                             \
>> +        /* NB: d may be evaluated multiple times, or not at all */ \
>> +        pcidevs_locked() || (d && rw_is_locked(&d->pci_lock));     \
> 
> 'd' is missing parentheses here, should be (d).

Thanks for spotting. I'll fix in v13.2.

> 
>> +    })
>> +#else
>> +bool pdev_list_is_read_locked(const struct domain *d);
>> +#endif
> 
> FWIW, if this is only intended to be used with ASSERT, it might as
> well be an ASSERT itself:
> 
> ASSERT_PDEV_LIST_IS_READ_LOCKED(d) ...
> 
> Don't have a strong opinion, so I'm fine with how it's used, just
> noting it might be clearer if it was an ASSERT_ right away.

This would also have the benefit of not relying on dead code elimination
in the #else case. I'll be sending v13.2 anyway, I may as well make the
change.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.