[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure


  • To: Oleksandr Andrushchenko <andr2000@xxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 15 Feb 2022 11:48:13 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Z/GsZZ/tQMpm7fhOCkl8fdHVdjaleBvvxnZSdZgItSI=; b=OAud1JuyFf6ygju20Ffvv7twbPj67CoXTcxvZs9Whb6MYvt9qJdBF/Dq7QT9LfawJCmh0CaOAQVWjGqKVPMrF/cLL6mNwmfti8xlHxi7VxUtF8eeEoJre1wLBLtQTaRZex9B7n4ag+S9RtK+xsR2n4FphSomVJAyZYIbsEO0Jnx8h1+mbWNSv9CgMwz6sPJRO/YQM4zJiSO5LaAWLqwmu1N82kh3KhzgH5vpvYs7QZ/4E31ftsNsK4scWZdsUNMTcPz3zkdGM8Ey/30ZrQn2b03JEP+P8UIGnddZO6ZNL6qFAKDzCGto3U+S2ElQdOAoWyqcHJz8tBMd53ROyt3igg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bJ+X3kSd7ZEa4X9Hy2sOEn+mrCrgxvC5NdugLImThLv/D7K+XNzn1SK7FkcUNfaH+kHhUunIriaKPKx/aB347ufY0xWH8wtF6azMgh2zEccryanRNdLbi/Xy8wbP4/9WVKzz+d6Y8o1UB166Cv3akiPvVpVnIjDxtym19HYbjuUrJMCDgu0Y+cFFoEQlJTYpzWDvXOLoOqxrB6tjChjQ6PDxysRAbY0gwKSiVAfPrAs2GRmy5wAeGc42GbWa1ltx6GNPr1ZJg1dCim+EwW+YiDbgp6nRwfiy1gWW++COTrRKkAqk7SLEnp7s2q6pduiEDHgJRpYCLXMwaWdF373p/Q==
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>, <jbeulich@xxxxxxxx>, <julien@xxxxxxx>, <sstabellini@xxxxxxxxxx>, <oleksandr_tyshchenko@xxxxxxxx>, <volodymyr_babchuk@xxxxxxxx>, <artem_mygaiev@xxxxxxxx>, <bertrand.marquis@xxxxxxx>, <rahul.singh@xxxxxxx>, Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
  • Delivery-date: Tue, 15 Feb 2022 10:48:32 +0000
  • Ironport-data: A9a23:aVWNlK8D7sFsrhbna2WkDrUDpHiTJUtcMsCJ2f8bNWPcYEJGY0x3y zYZXzyAM/mINGP2L4hyYYi2pk5SvJDRyocyTQpl/no8E34SpcT7XtnIdU2Y0wF+jyHgoOCLy +1EN7Es+ehtFie0Si9AttENlFEkvU2ybuOU5NXsZ2YhFWeIdA970Ug5w7Rg39Yy6TSEK1jlV e3a8pW31GCNg1aYAkpMg05UgEoy1BhakGpwUm0WPZinjneH/5UmJMt3yZWKB2n5WuFp8tuSH I4v+l0bElTxpH/BAvv9+lryn9ZjrrT6ZWBigVIOM0Sub4QrSoXfHc/XOdJFAXq7hQllkPhVx t5crJycZTx5HZD8qMgNYQkAAX1HaPguFL/veRBTsOSWxkzCNXDt3+9vHAc9OohwFuRfWD8Us 6ZCcXZUM07F17neLLGTE4GAguw5K8bmJsUHs2xIxjDFF/c2B5vERs0m4PcGh2lv3JsXR54yY eIbWzFLcgb4OidsM3I4VbMBgsCLuCbWJmgwRFW9+vNsvjm7IBZK+LH3LNfQTdmbSsxUk1iwq 3rP+iLyBRRyHMKYzT2J43e9nNjFlCnwWJ8RPLCg//ssi1qWrkQZBQcKT1K9rb+8g1SnRtNEA 0UO/2wlqq1a3EuvQ9rmVhu0ukmNuBIGRsFQGO037gKK4qfM6gPfDW8BJhZbYdw7sIktRDol1 neAhdavDjtq2JWSTX+e7b6SoSmFJTkOLWQCaCkHSiMI+9Dm5oo0i3rnXttlVaK4kNDxMTXx2 CyR6jgzga0JiswG3Ln9+krI6xqloJTTFFZtvi3YW2uk6kVyY4vNT46i5EXf7P1ABJ2EVVTHt 38B8+CF9/wHB5yJkC2LQc0OEauv6vLDNyfT6XZtAp0g+jKF63OlO4dK71lDyFxBa5heP2WzO QmK5F0XtMQ70GaWgbFfR4y6Gt0q4YTbRffYZsD4fuheOootTVrSlM1xXnK402fomUkqtKgwP 5aHbMqhZUon5bRbICmeHLlEj+Jyrswq7SaKHM2gkUz7uVaLTCPNEd843E2ygvfVBU9uiCHc6 J5hOsSD0H2zu8WuM3CMoeb/wb3nREXX5KwaSeQKLIZvwSI8QQnN7sM9Jpt7JeRYc1x9zLugw 51EchYwJKDDrXPGMx6WTXtodaniW51yxVpiY3BwYgb4hiV7ON31hEv6S3fQVeN5nNGPMNYuF 6VVEyl+Kqgnpsv7F8Q1MsCm8d0KmOWDjgOSJSu1CAXTjLY7LzElDuTMJ1O1nAFXV3LfnZJn/ 9WIi1OKKbJeFl8KJJuHN5qSI6aZ4CF1dBRaBBCTfLG+uSzEreBXFsAGpqZsfZ9VcUyZnmTyO sT/KU5wmNQharQdqbHhrauFs52oA615GE9bFHPc9rG4KW/R+W/L/GOKeL/gken1WDym9aO8S /9Syv2gYvQLkEwT69h3EqpxzLJ47Nzq/ucIwgNhFXTNTlKqFrI/fSXWgZgR7vVAlu1DpA+7e kOT4d0Ga7+HD9zoTQwKLw0/Y+XdifxNwmvO7e44KVnR7TNs+ObVSl1bOhSB0XQPLLZ8PI4/7 /0mvcoat162hhYwa47UhSFI7WWcaHcHVvx/5J0dBYbqjCsty01DPsOAWnOnvsnXZowVYEcwI zKSiK7TvJhmxxLPIygpCHzA/etBnpBS6hpE+0APegaSkd3fi/5pgBAIqWYrTh5Yxwls2v5oP jQ5LFV8IKiD8ms6hMVHWGzwSQhNCAfApx70wloN0mbYU1OpRirGK2hkYbSB+0UQ8mR9eDlH/ e7HlDa5AGiyJMyhjDEvXUNFquD4SY0j/wLPr8mrAsCZEsRoejHimKKvOTIFphaP7RndX6EbS T2GJNpNVJA=
  • Ironport-hdrordr: A9a23:AX4FI6q/rdB1lxg9y7qNZm8aV5uzL9V00zEX/kB9WHVpm5Oj+P xGzc526farslsssREb+OxpOMG7MBThHLpOkPMs1NCZLXTbUQqTXfpfBO7ZrQEIdBeOlNK1uZ 0QFpSWTeeAcWSS7vyKkTVQcexQueVvmZrA7Yy1rwYPcegpUdAZ0+4QMHfkLqQcfnghOXNWLu v52iIRzADQBkj/I/7LTUUtbqzmnZnmhZjmaRkJC1oO7xSPtyqh7PrfHwKD1hkTfjtTyfN6mF K13jDR1+GGibWW2xXc32jc49B/n8bg8MJKAIiphtIOIjvhpw60bMBKWqGEvhoyvOazgWxa2u XkklMFBYBe+nnRdma6rV/E3BTh6i8n7zvYxVqRkRLY0LrEbQN/L/AEqZNScxPf5UZllsp7yr h302WQsIcSJQ/cnQzmjuK4GS1Cpw6Rmz4PgOQTh3tQXc81c7lKt7ES+0tTDdMpAD/60oY6C+ NjZfusq8q+SWnqL0wxg1Mfg+BFBh8Ib1W7qwk5y4CoOgFt7TFEJxBy/r1bop8CnKhNPKWsqd 60dpiAr4s+PfP+W5gNcNvpcfHHelAlfii8Ql56AW6XXZ3vaEi946Ie3t0OlZSXkdozvdwPpK g=
  • Ironport-sdr: EQ3/F/6IYS7t1jyANOxBKtTUTCW0lUUnmCXWXTV1ChmyRXXrJW9sZW3OD+eR+eudmA/wHtvh1a e8DLddyd/1KOM8CqBA9XN6yK6+fmL/Cm0jfHC2sTp/XdilBE3QCCU63MAvK2pETOJMX2VkX0V5 MhXvsmdzEnsBFy6W3Wj0e3p9sEznojpJVtl62F0D9bZlkLiDye6mggiFwQ1QiBR6CqMIep0hO5 Rg4RemRYdrQMZC1tt8fASZOm6Txh9tjHtv9aSuzWF7AmPlsmkFeUXs0AqhYCfsnxMobXiEBUpt FjcPG0XmQBvR04g85ybkMw5S
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Feb 15, 2022 at 10:11:35AM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
> 
> Introduce a per-domain read/write lock to check whether vpci is present,
> so we are sure there are no accesses to the contents of the vpci struct
> if not. This lock can be used (and in a few cases is used right away)
> so that vpci removal can be performed while holding the lock in write
> mode. Previously such removal could race with vpci_read for example.
> 
> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure
> from being removed.
> 
> 2. Writing the command register and ROM BAR register may trigger
> modify_bars to run, which in turn may access multiple pdevs while
> checking for the existing BAR's overlap. The overlapping check, if done
> under the read lock, requires vpci->lock to be acquired on both devices
> being compared, which may produce a deadlock. It is not possible to
> upgrade read lock to write lock in such a case. So, in order to prevent
> the deadlock, check which registers are going to be written and acquire
> the lock in the appropriate mode from the beginning.
> 
> All other code, which doesn't lead to pdev->vpci destruction and does not
> access multiple pdevs at the same time, can still use a combination of the
> read lock and pdev->vpci->lock.
> 
> 3. Optimize if ROM BAR write lock required detection by caching offset
> of the ROM BAR register in vpci->header->rom_reg which depends on
> header's type.
> 
> 4. Reduce locked region in vpci_remove_device as it is now possible
> to set pdev->vpci to NULL early right after the write lock is acquired.
> 
> 5. Reduce locked region in vpci_add_handlers as it is possible to
> initialize many more fields of the struct vpci before assigning it to
> pdev->vpci.
> 
> 6. vpci_{add|remove}_register are required to be called with the write lock
> held, but it is not feasible to add an assert there as it requires
> struct domain to be passed for that. So, add a comment about this requirement
> to these and other functions with the equivalent constraints.
> 
> 7. Drop const qualifier where the new rwlock is used and this is appropriate.
> 
> 8. Do not call process_pending_softirqs with any locks held. For that unlock
> prior the call and re-acquire the locks after. After re-acquiring the
> lock there is no need to check if pdev->vpci exists:
>  - in apply_map because of the context it is called (no race condition
>    possible)
>  - for MSI/MSI-X debug code because it is called at the end of
>    pdev->vpci access and no further access to pdev->vpci is made
> 
> 9. Check for !pdev->vpci in vpci_{read|write} after acquiring the lock
> and if so, allow reading or writing the hardware register directly. This is
> acceptable as we only deal with Dom0 as of now. Once DomU support is
> added the write will need to be ignored and read return all 0's for the
> guests, while Dom0 can still access the registers directly.
> 
> 10. Introduce pcidevs_trylock, so there is a possibility to try locking
> the pcidev's lock.
> 
> 11. Use pcidev's lock around for_each_pdev and pci_get_pdev_by_domain
> while accessing pdevs in vpci code.

So if you use the pcidevs_lock then it's impossible for the pdev or
pdev->vpci to be removed or recreated, as the pcidevs lock protects
any device operations (add, remove, assign, deassign).

It's however not OK to use the pcidevs lock in vpci_{read,write}
as-is, as the introduced contention is IMO not acceptable.

The only viable option I see here is to:

 1. Make the pcidevs lock a rwlock: switch current callers to take the
    lock in write mode, detect and fixup any issues that could arise
    from the lock not being recursive anymore.
 2. Take the lock in read mode around vpci_{read,write} sections that
    rely on pdev (including the handlers).

These items should be at least two separate patches. Let's not mix the
conversion of pcidevs locks with the addition of vPCI support.

I think with that we could get away without requiring a per-domain
rwlock? Just doing lock ordering in modify_bars regarding
tmp->vpci->lock vs pdev->vpci->lock. Neither pdev or vpci can go away
while holding the pcidevs lock.

Sorting the situation in modify_bars should also be done as a separate
patch on top of 1. and 2.

> 
> 12. This is based on the discussion at [1].
> 
> [1] https://lore.kernel.org/all/20220204063459.680961-4-andr2000@xxxxxxxxx/
> 
> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>

I've made some small comments below, but given my proposal above I
think the code would change a great deal if we decide to use pcidevs
lock.

> 
> ---
> This was checked on x86: with and without PVH Dom0.
> 
> Since v1:
> - s/ASSERT(!!/ASSERT(
> - move vpci_header_write_lock to vpci.c and rename to
>   vpci_header_need_write_lock
> - use a simple static overlap function instead of vpci_offset_cmp
> - signal no ROM BAR with rom_reg == 0
> - msix_accept: new line before return
> - do not run process_pending_softirqs with locks held
> - in-code comments update
> - move rom_reg before rom_enabled in struct vpci. Roger, it is not
>   possible to move it after 'type' as in this case it becomes per BAR
>   and we need it per vpci
> - add !pdev->vpci checks to vpci_{read|write}
> - move ASSERT(pdev->vpci) in add_handlers under the write lock
> - introduce pcidevs_trylock
> - protect for_each_pdev with pcidevs lock
> ---
>  xen/arch/x86/hvm/vmsi.c       |   7 +++
>  xen/common/domain.c           |   3 +
>  xen/drivers/passthrough/pci.c |   5 ++
>  xen/drivers/vpci/header.c     |  56 +++++++++++++++++++
>  xen/drivers/vpci/msi.c        |  25 ++++++++-
>  xen/drivers/vpci/msix.c       |  41 ++++++++++++--
>  xen/drivers/vpci/vpci.c       | 100 ++++++++++++++++++++++++++--------
>  xen/include/xen/pci.h         |   1 +
>  xen/include/xen/sched.h       |   3 +
>  xen/include/xen/vpci.h        |   6 ++
>  10 files changed, 215 insertions(+), 32 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 13e2a190b439..2a13c6581345 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -893,6 +893,9 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>  {
>      unsigned int i;
>  
> +    ASSERT(rw_is_locked(&msix->pdev->domain->vpci_rwlock));
> +    ASSERT(pcidevs_locked());
> +
>      for ( i = 0; i < msix->max_entries; i++ )
>      {
>          const struct vpci_msix_entry *entry = &msix->entries[i];
> @@ -911,7 +914,11 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>              struct pci_dev *pdev = msix->pdev;
>  
>              spin_unlock(&msix->pdev->vpci->lock);
> +            pcidevs_unlock();
> +            read_unlock(&pdev->domain->vpci_rwlock);
>              process_pending_softirqs();
> +            read_lock(&pdev->domain->vpci_rwlock);
> +            pcidevs_lock();

This is again an ABBA situation: vpci_add_handlers will get called
with pci_devs locked, and it will try to acquire the per-domain vpci
lock (so pcidevs -> vpci_rwlock) while here and in other places in the
patch to you have inverse locking order (vpci_rwlock -> pcidevs).

>              /* NB: we assume that pdev cannot go away for an alive domain. */
>              if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
>                  return -EBUSY;
> @@ -323,10 +334,18 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
> unsigned int size)
>      }
>  
>      /* Find the PCI dev matching the address. */
> +    pcidevs_lock();
>      pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
> +    pcidevs_unlock();
>      if ( !pdev )
>          return vpci_read_hw(sbdf, reg, size);

There's a window here (between dropping the pcidevs lock and acquiring
the vpci_rwlock where either the pdev or pdev->vpci could be removed
or recreated.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.