[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v13 01/14] vpci: use per-domain PCI lock to protect vpci structure


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
  • Date: Tue, 13 Feb 2024 11:57:16 -0500
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=citrix.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=tZFUcy/BkWZSpm4tijbRMhFxw02Dw2CSWZd7dmO1ADQ=; b=OpEVc2WYKG7wVhX7YuxT8hPA2r9NBir/Gcf5hJiHwT+20Zl/X+EEyHXx3eRog9Sn7wtHx55KGEGZrdpMTVNEE6pZD3Gtbxe6ykLkaj05FfJ9y1GXJaLY3/EOFxprnZQD/yLsMSnfPF80aIEZQ4+aYDNyRF2Doize49jdZekDVGB55HLBacele6k0cNIMC7TmRZDHBmGpAgyTQYdW7fkXmBmzbYo9ZNQ9sZ2mNx9wGN7He/7sC1KIA/4tK4qp9tGZ1TJS2lvwoPkxneuuzRi0n0IC2/k1osAsC+zKpz4axALvJxkPYt4Aqc/Gia6f5f/1NIFNS63vH4xmB561sKNH+A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EjikUuXkg8CEtizKiAHEG97sG4p+pQ/F5wIhzcFINQzn9t5PqQfM/Zkwi5xwN+cLcTLfHXpgEGm7n9FKpsxi99vd/1oDQ+ENg0atURzppSezVcFg0MrZrBtm1fZFiO2sFf6U4ZPcVshVEzKVoYTzFeBxVMvFHfNdoT9yYSim/nhVD7oPlXPYyTbhldjYv0zlJnZbCHlW3Y8l3xJZo3aMHgdNLAG4hfv60OUIupOfMKiCfLjx+MiI2aAJLMt/yAqOriZwcZCX8eXNlDkUaqJ+ROdVnF/uTRAxf1CIp5Miz6W3ogjQA/FH+muk7k8UDJRe74SBVJUQnxsrNzJYbQ3hDQ==
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
  • Delivery-date: Tue, 13 Feb 2024 16:57:36 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 2/13/24 03:35, Roger Pau Monné wrote:
> On Fri, Feb 02, 2024 at 04:33:05PM -0500, Stewart Hildebrand wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>
>> Use the per-domain PCI read/write lock to protect the presence of the
>> pci device vpci field. This lock can be used (and in a few cases is used
>> right away) so that vpci removal can be performed while holding the lock
>> in write mode. Previously such removal could race with vpci_read for
>> example.
>>
>> When taking both d->pci_lock and pdev->vpci->lock, they should be
>> taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid
>> possible deadlock situations.
>>
>> 1. Per-domain's pci_lock is used to protect pdev->vpci structure
>> from being removed.
>>
>> 2. Writing the command register and ROM BAR register may trigger
>> modify_bars to run, which in turn may access multiple pdevs while
>> checking for the existing BAR's overlap. The overlapping check, if
>> done under the read lock, requires vpci->lock to be acquired on both
>> devices being compared, which may produce a deadlock. It is not
>> possible to upgrade read lock to write lock in such a case. So, in
>> order to prevent the deadlock, use d->pci_lock in write mode instead.
>>
>> All other code, which doesn't lead to pdev->vpci destruction and does
>> not access multiple pdevs at the same time, can still use a
>> combination of the read lock and pdev->vpci->lock.
>>
>> 3. Drop const qualifier where the new rwlock is used and this is
>> appropriate.
>>
>> 4. Do not call process_pending_softirqs with any locks held. For that
>> unlock prior the call and re-acquire the locks after. After
>> re-acquiring the lock there is no need to check if pdev->vpci exists:
>>  - in apply_map because of the context it is called (no race condition
>>    possible)
>>  - for MSI/MSI-X debug code because it is called at the end of
>>    pdev->vpci access and no further access to pdev->vpci is made
>>
>> 5. Use d->pci_lock around for_each_pdev and pci_get_pdev()
>> while accessing pdevs in vpci code.
>>
>> 6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs
>> do not go away. The vPCI functions call several MSI-related functions
>> which already have existing non-vPCI callers. Change those MSI-related
>> functions to allow using either pcidevs_lock() or d->pci_lock for
>> ensuring pdevs do not go away. Holding d->pci_lock in read mode is
>> sufficient. Note that this pdev protection mechanism does not protect
>> other state or critical sections. These MSI-related functions already
>> have other race condition and state protection mechanims (e.g.
>> d->event_lock and msixtbl RCU), so we deduce that the use of the global
>> pcidevs_lock() is to ensure that pdevs do not go away. Existing non-vPCI
>> callers of these MSI-related functions will remain (ab)using the global
>> pcidevs_lock() to ensure pdevs do not go away so as to minimize changes
>> to existing non-vPCI call paths.
>>
>> 7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking
>> that pdevs do not go away. The purpose of this wrapper is to aid
>> readability and document the intent of the pdev protection mechanism.
> 
> I would add that when possible, the existing callers haven't been
> switched to use the newly introduced per-domain pci_lock, and will
> continue to use the global pcidevs lock.  This is done to reduce the
> risk of the new locking scheme introducing regressions.  Those users
> will be adjusted in due time.

I'll use this wording, thanks

> 
> IIRC Jan had concerns about why some existing use-cases are not
> switched straight to use the new per-domain pci_lock in this patch.

I hope the clarified commit description addresses this

> 
>>
>> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
>> ---
>> Changes in v13:
>>  - hold off adding Roger's R-b tag even though it was provided on v12.2
>>  - use a wrapper construct to ease readability of odd-looking ASSERTs
>>  - new placement of ASSERT in __pci_enable_msix(), __pci_enable_msi(),
>>    and pci_enable_msi(). Rearrange/add pdev NULL check.
>>  - expand commit description with details about using either
>>    pcidevs_lock() or d->pci_lock
>>
>> Changes in v12.2:
>>  - drop Roger's R-b
>>  - drop both locks on error paths in vpci_msix_arch_print()
>>  - add another ASSERT in vpci_msix_arch_print(), to enforce the
>>    expectation both locks are held before calling vpci_msix_arch_print()
>>  - move pdev_done label in vpci_dump_msi()
>>  - update comments in vpci_dump_msi() to say locks (plural)
>>
>> Changes in v12.1:
>>  - use read_trylock() in vpci_msix_arch_print()
>>  - fixup in-code comments (revert double space, use DomXEN) in
>>    vpci_{read,write}()
>>  - minor updates in commit message
>>  - add Roger's R-b
>>
>> Changes in v12:
>>  - s/pci_rwlock/pci_lock/ in commit message
>>  - expand comment about scope of pci_lock in sched.h
>>  - in vpci_{read,write}, if hwdom is trying to access a device assigned
>>    to dom_xen, holding hwdom->pci_lock is sufficient (no need to hold
>>    dom_xen->pci_lock)
>>  - reintroduce ASSERT in vmx_pi_update_irte()
>>  - reintroduce ASSERT in __pci_enable_msi{x}()
>>  - delete note 6. in commit message about removing ASSERTs since we have
>>    reintroduced them
>>
>> Changes in v11:
>>  - Fixed commit message regarding possible spinlocks
>>  - Removed parameter from allocate_and_map_msi_pirq(), which was added
>>  in the prev version. Now we are taking pcidevs_lock in
>>  physdev_map_pirq()
>>  - Returned ASSERT to pci_enable_msi
>>  - Fixed case when we took read lock instead of write one
>>  - Fixed label indentation
>>
>> Changes in v10:
>>  - Moved printk pas locked area
>>  - Returned back ASSERTs
>>  - Added new parameter to allocate_and_map_msi_pirq() so it knows if
>>  it should take the global pci lock
>>  - Added comment about possible improvement in vpci_write
>>  - Changed ASSERT(rw_is_locked()) to rw_is_write_locked() in
>>    appropriate places
>>  - Renamed release_domain_locks() to release_domain_write_locks()
>>  - moved domain_done label in vpci_dump_msi() to correct place
>> Changes in v9:
>>  - extended locked region to protect vpci_remove_device and
>>    vpci_add_handlers() calls
>>  - vpci_write() takes lock in the write mode to protect
>>    potential call to modify_bars()
>>  - renamed lock releasing function
>>  - removed ASSERT()s from msi code
>>  - added trylock in vpci_dump_msi
>>
>> Changes in v8:
>>  - changed d->vpci_lock to d->pci_lock
>>  - introducing d->pci_lock in a separate patch
>>  - extended locked region in vpci_process_pending
>>  - removed pcidevs_lockis vpci_dump_msi()
>>  - removed some changes as they are not needed with
>>    the new locking scheme
>>  - added handling for hwdom && dom_xen case
>> ---
>>  xen/arch/x86/hvm/vmsi.c       | 31 +++++++++++++--------
>>  xen/arch/x86/hvm/vmx/vmx.c    |  2 +-
>>  xen/arch/x86/irq.c            |  8 +++---
>>  xen/arch/x86/msi.c            | 20 +++++++++-----
>>  xen/arch/x86/physdev.c        |  2 ++
>>  xen/drivers/passthrough/pci.c |  9 +++---
>>  xen/drivers/vpci/header.c     | 18 ++++++++++++
>>  xen/drivers/vpci/msi.c        | 30 +++++++++++++++++---
>>  xen/drivers/vpci/msix.c       | 52 ++++++++++++++++++++++++++++++-----
>>  xen/drivers/vpci/vpci.c       | 24 ++++++++++++++--
>>  xen/include/xen/sched.h       | 15 +++++++++-
>>  11 files changed, 170 insertions(+), 41 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
>> index 128f23636279..f29089178a59 100644
>> --- a/xen/arch/x86/hvm/vmsi.c
>> +++ b/xen/arch/x86/hvm/vmsi.c
>> @@ -468,7 +468,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq 
>> *pirq, uint64_t gtable)
>>      struct msixtbl_entry *entry, *new_entry;
>>      int r = -EINVAL;
>>  
>> -    ASSERT(pcidevs_locked());
>> +    ASSERT(pdev_list_is_read_locked(d));
>>      ASSERT(rw_is_write_locked(&d->event_lock));
>>  
>>      if ( !msixtbl_initialised(d) )
>> @@ -538,7 +538,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq 
>> *pirq)
>>      struct pci_dev *pdev;
>>      struct msixtbl_entry *entry;
>>  
>> -    ASSERT(pcidevs_locked());
>> +    ASSERT(pdev_list_is_read_locked(d));
>>      ASSERT(rw_is_write_locked(&d->event_lock));
>>  
>>      if ( !msixtbl_initialised(d) )
>> @@ -684,7 +684,7 @@ static int vpci_msi_update(const struct pci_dev *pdev, 
>> uint32_t data,
>>  {
>>      unsigned int i;
>>  
>> -    ASSERT(pcidevs_locked());
>> +    ASSERT(rw_is_locked(&pdev->domain->pci_lock));
> 
> Any reason to not use the newly introduced helper here?  I know the
> pcidevs will never be locked here given the new lock usage, but still
> it would be less confusing if the new helper was used consistently.

Nope. Agreed, it would help readability, I'll switch to the helper. In
addition to vpci_msi_update(), I assume this comment also applies to the
remaining occurrences:

xen/arch/x86/hvm/vmsi.c:vpci_msi_arch_update()
xen/arch/x86/hvm/vmsi.c:vpci_msi_arch_enable()
xen/arch/x86/hvm/vmsi.c:vpci_msi_disable()
xen/arch/x86/hvm/vmsi.c:vpci_msix_arch_enable_entry()
xen/drivers/vpci/msix.c:msix_find()

But not:

xen/arch/x86/hvm/vmsi.c:vpci_msix_arch_print()

I'll add a suitable comment to this one exception.

> 
> Otherwise we need a comment here as to why the helper can't be used,
> in order to avoid confusion in the future.
> 
>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>> index 9da91e0e6244..c3adec1aca3c 100644
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -462,7 +462,8 @@ struct domain
>>  #ifdef CONFIG_HAS_PCI
>>      struct list_head pdev_list;
>>      /*
>> -     * pci_lock protects access to pdev_list.
>> +     * pci_lock protects access to pdev_list. pci_lock also protects 
>> pdev->vpci
>> +     * structure from being removed.
>>       *
>>       * Any user *reading* from pdev_list, or from devices stored in 
>> pdev_list,
>>       * should hold either pcidevs_lock() or pci_lock in read mode. 
>> Optionally,
>> @@ -628,6 +629,18 @@ struct domain
>>      unsigned int cdf;
>>  };
>>  
>> +/*
>> + * Check for use in ASSERTs to ensure that:
>> + *   1. we can *read* d->pdev_list
>> + *   2. pdevs (belonging to this domain) do not go away
>> + *   3. pdevs (belonging to this domain) do not get assigned to other 
>> domains
> 
> I think you can just state that this check ensures there will be no
> changes to the entries in d->pdev_list, but not the contents of each
> entry.  No changes to d->pdev_list already ensures not devices can be
> deassigned or removed from the system, and obviously makes the list
> safe to iterate against.

OK, I'll simplify the comment

> 
> I would also drop the explicitly mention this is intended for ASSERT
> usage: there's nothing specific in the code that prevents it from
> being used in other places (albeit I think that's unlikely).
> 
>> + * This check is not suitable for protecting other state or critical 
>> regions.
>> + */
>> +#define pdev_list_is_read_locked(d) ({                           \
> 
> I would be tempted to drop at least the '_read_' part from the name,
> the name is getting a bit too long for my taste.
> 
>> +        struct domain *d_ = (d);                                 \
> 
> Why do you need this local domain variable?  Can't you use the d
> parameter directly?
> 
> Such assign will prevent using a const 'd' parameter, and 'd_' itself
> should be const IMO (iff we really need this).
> 
> Also sched.h is not the best place, can't you just place it in
> pci.h?

Yes, I'll move it to xen/include/xen/pci.h



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.