[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v9 01/16] pci: introduce per-domain PCI rwlock


  • To: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 19 Sep 2023 16:09:39 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9Wlu/cL8a5sbGwcXL6S4ZPGCSKgPPB297NyQjClZR8w=; b=Nrh2bYzwO0VUgdse+Kqwq2mbAcwHs7N+iONrmDSTLEfwvoP3zo4pZk8zJ8G0iXvLPDzzXI/P9ZDyN2g/pgVup7zk6d8wpWXuAxxtdybDRVEIfDgta0c9xr/BzoYr5UMAgGPAZIz3WGUvHgmq+/wxw/xrncewuuDenD2WQ240n3tcIETCGiwAb57+ecIQd/RpgvjEihCpbSrbycPZFLeWNfo2BUmWV3t2YJiyptbSFVDSEIJ280UXOQmRmuLC3MR16uky6ApbMVg7axIZqg1DV8Af/lcME5OZHLeDDCEFqy3zFxOfBwM3dq0j0jaUvYw6PMuLHoJL7f86niUv+ZznKA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=go30rtAz/ix7M7ub3tNtg8/miAVuC7JTjMEuXz3wTinYfjxtUHwirXf2xkYG7JyMBfxBqrwwpruXzS7kSxQxJ4yTUGPFb30JYhIIXQMKcPPDUl4fiz6U7X+md/gJwQI+xPyUNwpC3qGNFAj+y9O0JpRFCHi79fpl6HwXHuGQoXsYeVUGmpRKDNe/31TT4TL2EdUFcE5vRqZJfTfE5VRIFAFSk3CxGqhp6He8jwSsqMwMuRLT7eCfyk4Ku7d1n5o3dAssjD2RJcuHa3NnNa6so3HtFRf4++u4Gxj/BLcd6l+KGu7aebD0NGWKdILfm5+vdijJoq2lBc+enBg29A/ZqA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stewart Hildebrand <stewart.hildebrand@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Paul Durrant <paul@xxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>
  • Delivery-date: Tue, 19 Sep 2023 14:10:09 +0000
  • Ironport-data: A9a23:ddcs5q+d53T6leAdxqMFDrUDIn+TJUtcMsCJ2f8bNWPcYEJGY0x3y 2EdXTvSM/nbNmOmeNB2Oduy8R5S7ZWDytNlTlBkrCA8E34SpcT7XtnIdU2Y0wF+jCHgZBk+s 5hBMImowOQcFCK0SsKFa+C5xZVE/fjVAOK6UKidYnwZqTZMEE8JkQhkl/MynrlmiN24BxLlk d7pqojUNUTNNwRcawr40Ird7ks11BjOkGlA5AdmNKka5AW2e0Q9V/rzG4ngdxMUfaEMdgKKb 76r5K20+Grf4yAsBruN+losWhRXKlJ6FVHmZkt+A8BOsDAbzsAB+v9T2M4nQVVWk120c+VZk 72hg3ASpTABZcUgkMxFO/VR/roX0aduoNcrKlDn2SCfItGvn9IBDJyCAWlvVbD09NqbDkl2+ 90hGgBKRyyYmuG1wZylQcdzj+MseZyD0IM34hmMzBn/JNN/GNXoZPyP4tVVmjAtmspJAPDSI dIDbiZiZwjBZBsJPUoLDJU5n6GjgXyXnz9w8QrJ4/ZopTWNilUujNABM/KMEjCObd9SkUuC4 HrP4kzyAw0ANczZwj2Amp6prraVwHOgAtpNT9VU8NZ42kWC6UJCOCYkbnnqnaKTo1OhXfNAf hl8Fi0G6PJaGFaQZuf6Wxq0sXuVpCk2UtBbE/A5wAyVw6+S6AGcbkAUQzgEZNE4ucseQT0xy kTPj97vHSZosrCeVTSa7Lj8hSiuNDccN3NEZS4AQQYP+dDlrKk6ix6JRdFmeIa3hNDoHTD7w xiRsTMzwb4UiKYj1bi//F3BqyKhoN7OVAFdzh7MQmuv4wd9ZYikT4+l817W6bBHNonxZkaFl GgJnY6Z9u9mJYmEiSilUOgLWraz6J6tMzDCgFgpA5go8Rys/WKuecZb5zQWGatyGsMNeDusa 0iKvwpUvcZXJCHzMvMxZJ+tAcM3y6SmDc7iSv3fcttJZN52aROD+yZtI0WX2ggBjXQRrE32A r/DGe7EMJrQIf4PIOaeLwvF7YIW+w==
  • Ironport-hdrordr: A9a23:tEfX5qpQTZDhJCPMGidtphcaV5tQL9V00zEX/kB9WHVpmwKj5q STdZUgpGvJYVMqMx8dcL+7WJVoPkmsiqKdjbNxAV7AZniVhILXFvAB0WKK+VSJcREWndQttp uIHZIObeEZBjBB/LjHCGHTKbodKLLsys+VbSi19RpQZDAvUoUlyzpQTj+cFEgefngyOXL6fq Dsl/auY1CbCAcqhgHQPAh0YwG5naytqLvWJSQeAgIh6k2nlCrA0s+DLzGomi0GVi9Jw/MI7W jBnmXCl9memsD+8AbYy2jQq7NfnNeJ8KokOOW8zvINLynqiEKPeoNsQNS5zUkIidDq0k8ujN 7P5y0BEq1ImgjsV1DwmwLpxw7jlAwj8GDv0niRhXeLm72CeBsKT/BZgJ5fcF/n51E7vNd6uZ g7ol6kiw==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Aug 29, 2023 at 11:19:42PM +0000, Volodymyr Babchuk wrote:
> Add per-domain d->pci_lock that protects access to
> d->pdev_list. Purpose of this lock is to give guarantees to VPCI code
> that underlying pdev will not disappear under feet. This is a rw-lock,
> but this patch adds only write_lock()s. There will be read_lock()
> users in the next patches.
> 
> This lock should be taken in write mode every time d->pdev_list is
> altered. This covers both accesses to d->pdev_list and accesses to
> pdev->domain_list fields.

Why do you mention pdev->domain_list here?  I don't think the lock
covers accesses to pdev->domain_list, unless that domain_list field
happens to be part of the linked list in d->pdev_list.  I find it kind
of odd to mention here.

> All write accesses also should be protected
> by pcidevs_lock() as well. Idea is that any user that wants read
> access to the list or to the devices stored in the list should use
> either this new d->pci_lock or old pcidevs_lock(). Usage of any of
> this two locks will ensure only that pdev of interest will not
> disappear from under feet and that the pdev still will be assigned to
> the same domain. Of course, any new users should use pcidevs_lock()
> when it is appropriate (e.g. when accessing any other state that is
> protected by the said lock). In case both the newly introduced
> per-domain rwlock and the pcidevs lock is taken, the later must be
> acquired first.
> 
> Any write access to pdev->domain_list should be protected by both
> pcidevs_lock() and d->pci_lock in the write mode.
> 
> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
> 
> ---
> 
> Changes in v9:
>  - returned back "pdev->domain = target;" in AMD IOMMU code
>  - used "source" instead of pdev->domain in IOMMU functions
>  - added comment about lock ordering in the commit message
>  - reduced locked regions
>  - minor changes non-functional changes in various places
> 
> Changes in v8:
>  - New patch
> 
> Changes in v8 vs RFC:
>  - Removed all read_locks after discussion with Roger in #xendevel
>  - pci_release_devices() now returns the first error code
>  - extended commit message
>  - added missing lock in pci_remove_device()
>  - extended locked region in pci_add_device() to protect list_del() calls
> ---
>  xen/common/domain.c                         |  1 +
>  xen/drivers/passthrough/amd/pci_amd_iommu.c |  9 ++-
>  xen/drivers/passthrough/pci.c               | 71 +++++++++++++++++----
>  xen/drivers/passthrough/vtd/iommu.c         |  9 ++-
>  xen/include/xen/sched.h                     |  1 +
>  5 files changed, 78 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 304aa04fa6..9b04a20160 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -651,6 +651,7 @@ struct domain *domain_create(domid_t domid,
>  
>  #ifdef CONFIG_HAS_PCI
>      INIT_LIST_HEAD(&d->pdev_list);
> +    rwlock_init(&d->pci_lock);
>  #endif
>  
>      /* All error paths can depend on the above setup. */
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index bea70db4b7..d219bd9453 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -476,7 +476,14 @@ static int cf_check reassign_device(
>  
>      if ( devfn == pdev->devfn && pdev->domain != target )
>      {
> -        list_move(&pdev->domain_list, &target->pdev_list);
> +        write_lock(&source->pci_lock);
> +        list_del(&pdev->domain_list);
> +        write_unlock(&source->pci_lock);
> +
> +        write_lock(&target->pci_lock);
> +        list_add(&pdev->domain_list, &target->pdev_list);
> +        write_unlock(&target->pci_lock);
> +
>          pdev->domain = target;

While I don't think this is strictly an issue right now, it would be
better to set pdev->domain before the device is added to domain_list.
A pattern like:

read_lock(d->pci_lock);
for_each_pdev(d, pdev)
    foo(pdev->domain);
read_unlock(d->pci_lock);

Wouldn't work currently if the pdev is added to domain_list before the
pdev->domain field is updated to reflect the new owner.

>      }
>  
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 33452791a8..79ca928672 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -454,7 +454,9 @@ static void __init _pci_hide_device(struct pci_dev *pdev)
>      if ( pdev->domain )
>          return;
>      pdev->domain = dom_xen;
> +    write_lock(&dom_xen->pci_lock);
>      list_add(&pdev->domain_list, &dom_xen->pdev_list);
> +    write_unlock(&dom_xen->pci_lock);
>  }
>  
>  int __init pci_hide_device(unsigned int seg, unsigned int bus,
> @@ -748,7 +750,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>      if ( !pdev->domain )
>      {
>          pdev->domain = hardware_domain;
> +        write_lock(&hardware_domain->pci_lock);
>          list_add(&pdev->domain_list, &hardware_domain->pdev_list);
> +        write_unlock(&hardware_domain->pci_lock);
>  
>          /*
>           * For devices not discovered by Xen during boot, add vPCI handlers
> @@ -758,7 +762,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>          if ( ret )
>          {
>              printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
> +            write_lock(&hardware_domain->pci_lock);
>              list_del(&pdev->domain_list);
> +            write_unlock(&hardware_domain->pci_lock);
>              pdev->domain = NULL;
>              goto out;
>          }
> @@ -766,7 +772,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>          if ( ret )
>          {
>              vpci_remove_device(pdev);
> +            write_lock(&hardware_domain->pci_lock);
>              list_del(&pdev->domain_list);
> +            write_unlock(&hardware_domain->pci_lock);
>              pdev->domain = NULL;
>              goto out;
>          }
> @@ -816,7 +824,11 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>              pci_cleanup_msi(pdev);
>              ret = iommu_remove_device(pdev);
>              if ( pdev->domain )
> +            {
> +                write_lock(&pdev->domain->pci_lock);
>                  list_del(&pdev->domain_list);
> +                write_unlock(&pdev->domain->pci_lock);
> +            }
>              printk(XENLOG_DEBUG "PCI remove device %pp\n", &pdev->sbdf);
>              free_pdev(pseg, pdev);
>              break;
> @@ -887,26 +899,61 @@ static int deassign_device(struct domain *d, uint16_t 
> seg, uint8_t bus,
>  
>  int pci_release_devices(struct domain *d)
>  {
> -    struct pci_dev *pdev, *tmp;
> -    u8 bus, devfn;
> -    int ret;
> +    int combined_ret;
> +    LIST_HEAD(failed_pdevs);
>  
>      pcidevs_lock();
> -    ret = arch_pci_clean_pirqs(d);
> -    if ( ret )
> +
> +    combined_ret = arch_pci_clean_pirqs(d);
> +    if ( combined_ret )
>      {
>          pcidevs_unlock();
> -        return ret;
> +        return combined_ret;
>      }
> -    list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list )
> +
> +    write_lock(&d->pci_lock);
> +
> +    while ( !list_empty(&d->pdev_list) )
>      {
> -        bus = pdev->bus;
> -        devfn = pdev->devfn;
> -        ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
> +        struct pci_dev *pdev = list_first_entry(&d->pdev_list,
> +                                                struct pci_dev,
> +                                                domain_list);
> +        uint16_t seg = pdev->seg;
> +        uint8_t bus = pdev->bus;
> +        uint8_t devfn = pdev->devfn;
> +        int ret;
> +
> +        write_unlock(&d->pci_lock);
> +        ret = deassign_device(d, seg, bus, devfn);
> +        write_lock(&d->pci_lock);
> +        if ( ret )
> +        {
> +            const struct pci_dev *tmp;
> +
> +            /*
> +             * We need to check if deassign_device() left our pdev in
> +             * domain's list. As we dropped the lock, we can't be sure
> +             * that list wasn't permutated in some random way, so we
> +             * need to traverse the whole list.
> +             */
> +            for_each_pdev ( d, tmp )
> +            {
> +                if ( tmp == pdev )
> +                {
> +                    list_move_tail(&pdev->domain_list, &failed_pdevs);
> +                    break;
> +                }
> +            }
> +
> +            combined_ret = combined_ret ?: ret;
> +        }
>      }
> +
> +    list_splice(&failed_pdevs, &d->pdev_list);
> +    write_unlock(&d->pci_lock);
>      pcidevs_unlock();
>  
> -    return ret;
> +    return combined_ret;
>  }
>  
>  #define PCI_CLASS_BRIDGE_HOST    0x0600
> @@ -1125,7 +1172,9 @@ static int __hwdom_init cf_check 
> _setup_hwdom_pci_devices(
>              if ( !pdev->domain )
>              {
>                  pdev->domain = ctxt->d;
> +                write_lock(&ctxt->d->pci_lock);
>                  list_add(&pdev->domain_list, &ctxt->d->pdev_list);
> +                write_unlock(&ctxt->d->pci_lock);
>                  setup_one_hwdom_device(ctxt, pdev);
>              }
>              else if ( pdev->domain == dom_xen )
> diff --git a/xen/drivers/passthrough/vtd/iommu.c 
> b/xen/drivers/passthrough/vtd/iommu.c
> index 0e3062c820..3228900c97 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2806,7 +2806,14 @@ static int cf_check reassign_device_ownership(
>  
>      if ( devfn == pdev->devfn && pdev->domain != target )
>      {
> -        list_move(&pdev->domain_list, &target->pdev_list);
> +        write_lock(&source->pci_lock);
> +        list_del(&pdev->domain_list);
> +        write_unlock(&source->pci_lock);
> +
> +        write_lock(&target->pci_lock);
> +        list_add(&pdev->domain_list, &target->pdev_list);
> +        write_unlock(&target->pci_lock);
> +
>          pdev->domain = target;

Same comment as in reassign_device() above.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.