[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v8 01/13] pci: introduce per-domain PCI rwlock


  • To: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 20 Jul 2023 11:45:19 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=LE0F85uEAxd62xuUVo0COuuAbEvLmENkGtPEs/UfeaE=; b=NBp+Kos4nuKYH7iWPyAAEIFNUaJsK9cgfj8yNftOPZk0dn7DJ/h7dGUBkuWPBUCIb32DFWBkNpyIzL9Sf73ZoRqnHKfppUlfj7Sa7esEvtfM+hHkDYxuO4Nsa3ASLZrmINE2aq7KNwQ7W7YnVlIm4LXEuIpTe4/y9SgA08Nbwq+eKFvi7lw8J5PgRtrWk6g6nAmNuWSJDPkBYfdtyvK9Jniyf9jSNV8ggK+73rTdJHX1hSeJXHmU6JnSbS4Quj7olzN5hZmrjOYVGGokVAVIJBqwMGL9u6mYEYcbS03Z2yE9OSpoX2mGAvn0q/ljXe3h9uuNFUwYrbEDXaM19yd1eA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DZLld+6ePOxYBEiEOfiutWXS4ttAOaT9X+bG5KNjPs+pXNqwXkhM8zDOC8ZA2Tcl62h4OCd1MDyyeV1cZPlq26dRsRtE87+Jks+mAeTAfgV4VH2043GTOORrUE455H0TopXDyeYi2IIGgsmGXblqgH3xk9t4gti9mF8RLj3TzwStMfRC4QMtshumu3VoIKdC3u/obvnmVNGChHkwJh/UFuD8rNnLVce/iO2EB2YD9u1UmiDj+9BpTrckO1cSfvV1eJxBBJ5xM4M/8A8HNB6K3QPtL3kA4Jjnifx4ryjBEd+kc4Wx6azNuDThyO3YphgdJsqrQpWtECbZdB56oQ8aig==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
  • Delivery-date: Thu, 20 Jul 2023 09:45:48 +0000
  • Ironport-data: A9a23:5POsmagvQdDnH/MBqkFy7pb7X161VhEKZh0ujC45NGQN5FlHY01je htvWGyHbq6PMGKje9t+O4q2phsC68KGnYRrHgRuqSA0Hikb9cadCdqndUqhZCn6wu8v7q5Ex 55HNoSfdpBcolv0/ErF3m3J9CEkvU2wbuOgTrWCYmYpHlUMpB4J0XpLg/Q+jpNjne+3CgaMv cKai8DEMRqu1iUc3lg8sspvkzsx+qyr0N8klgZmP6sT4wWGzyB94K83fsldEVOpGuG4IcbiL wrz5OnR1n/U+R4rFuSknt7TGqHdauePVeQmoiM+t5mK2nCulARrukoIHKN0hXNsoyeIh7hMJ OBl7vRcf+uL0prkw4zxWzEAe8130DYvFLXveRBTuuTLp6HKnueFL1yDwyjaMKVBktubD12i+ tREC2xSaRejudic2YnmZtgvvOkhMsD0adZ3VnFIlVk1DN4AaLWaGuDgw48d2z09wMdTAfzZe swVLyJ1awjNaAFOPVFRD48imOCvhT/0dDgwRFC9/PJrpTSMilEhluGzYbI5efTTLSlRtlyfq W/cuXzwHzkRNcCFyCrD+XWp7gPKtXqiBd9OS+HhrJaGhnWq52kQWEw3DGG54tOe0GCMVesEc Ustr39GQa8asRbDosPGdyO/pHmIrxsNQe16Gucx6ByO4qfM6gPfDW8BJhZRZdpjuMIoSDgC0 l6Sg8ivFTFpqKeSS3+W6vGTtzzaESofIHIGZCQEZRAY+NSlq4Y25i8jVf5mGa+xy9fzSTf5x mnTqDBk3utCy8kWy6+84FbLxSq2oYTERRI04QORWX+56gR+Z8iuYInABUXn0Mus5b2xFjGp1 EXoUeDEhAzSJflhTBCwfdg=
  • Ironport-hdrordr: A9a23:ynv2KqHxiO7KlW1upLqEcceALOsnbusQ8zAXPidKJiC9E/b1qy nKpp8mPHDP5gr5J0tQ/OxoVJPhfZq+z/NICOsqTMuftWDd0QPDEGgF1/qA/9SKIVydygcy78 ZdmnlFebvN5RYTt7eC3OH1e+xQp+Vuv8iT9IPj5n1pUQpdbq1v5w1oPAKWFkZ7XxNGBYMOE4 aZj/A33QaIc3EKZfK+Cn0PU/PYpt3TopX7ZRULHHccmXSzpCqv8qPzFi6T2BoTOgk/uosK6m jEnmXCl92ej80=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Jul 20, 2023 at 12:32:31AM +0000, Volodymyr Babchuk wrote:
> Add per-domain d->pci_lock that protects access to
> d->pdev_list. Purpose of this lock is to give guarantees to VPCI code
> that underlying pdev will not disappear under feet. This is a rw-lock,
> but this patch adds only write_lock()s. There will be read_lock()
> users in the next patches.
> 
> This lock should be taken in write mode every time d->pdev_list is
> altered. This covers both accesses to d->pdev_list and accesses to
> pdev->domain_list fields. All write accesses also should be protected
> by pcidevs_lock() as well. Idea is that any user that wants read
> access to the list or to the devices stored in the list should use
> either this new d->pci_lock or old pcidevs_lock(). Usage of any of
> this two locks will ensure only that pdev of interest will not
> disappear from under feet and that the pdev still will be assigned to
> the same domain. Of course, any new users should use pcidevs_lock()
> when it is appropriate (e.g. when accessing any other state that is
> protected by the said lock).

I think this needs a note about the ordering:

"In case both the newly introduced per-domain rwlock and the pcidevs
lock is taken, the later must be acquired first."

> 
> Any write access to pdev->domain_list should be protected by both
> pcidevs_lock() and d->pci_lock in the write mode.

You also protect calls to vpci_remove_device() with the per-domain
pci_lock it seems, and that will need some explanation as it's not
obvious.

> 
> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
> 
> ---
> 
> Changes in v8:
>  - New patch
> 
> Changes in v8 vs RFC:
>  - Removed all read_locks after discussion with Roger in #xendevel
>  - pci_release_devices() now returns the first error code
>  - extended commit message
>  - added missing lock in pci_remove_device()
>  - extended locked region in pci_add_device() to protect list_del() calls
> ---
>  xen/common/domain.c                         |  1 +
>  xen/drivers/passthrough/amd/pci_amd_iommu.c |  9 ++-
>  xen/drivers/passthrough/pci.c               | 68 +++++++++++++++++----
>  xen/drivers/passthrough/vtd/iommu.c         |  9 ++-
>  xen/include/xen/sched.h                     |  1 +
>  5 files changed, 74 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index caaa402637..5d8a8836da 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -645,6 +645,7 @@ struct domain *domain_create(domid_t domid,
>  
>  #ifdef CONFIG_HAS_PCI
>      INIT_LIST_HEAD(&d->pdev_list);
> +    rwlock_init(&d->pci_lock);
>  #endif
>  
>      /* All error paths can depend on the above setup. */
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index 94e3775506..e2f2e2e950 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -476,8 +476,13 @@ static int cf_check reassign_device(
>  
>      if ( devfn == pdev->devfn && pdev->domain != target )
>      {
> -        list_move(&pdev->domain_list, &target->pdev_list);
> -        pdev->domain = target;

You seem to have inadvertently dropped the above line? (and so devices
would keep the previous pdev->domain value)

> +        write_lock(&pdev->domain->pci_lock);
> +        list_del(&pdev->domain_list);
> +        write_unlock(&pdev->domain->pci_lock);
> +
> +        write_lock(&target->pci_lock);
> +        list_add(&pdev->domain_list, &target->pdev_list);
> +        write_unlock(&target->pci_lock);
>      }
>  
>      /*
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 95846e84f2..5b4632ead2 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -454,7 +454,9 @@ static void __init _pci_hide_device(struct pci_dev *pdev)
>      if ( pdev->domain )
>          return;
>      pdev->domain = dom_xen;
> +    write_lock(&dom_xen->pci_lock);
>      list_add(&pdev->domain_list, &dom_xen->pdev_list);
> +    write_unlock(&dom_xen->pci_lock);
>  }
>  
>  int __init pci_hide_device(unsigned int seg, unsigned int bus,
> @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>      ret = 0;
>      if ( !pdev->domain )
>      {
> +        write_lock(&hardware_domain->pci_lock);
>          pdev->domain = hardware_domain;
>          list_add(&pdev->domain_list, &hardware_domain->pdev_list);
>  
> @@ -760,6 +763,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>              printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
>              list_del(&pdev->domain_list);
>              pdev->domain = NULL;
> +            write_unlock(&hardware_domain->pci_lock);

Strictly speaking, this could move one line earlier, as accesses to
pdev->domain are not protected by the d->pci_lock?  Same in other
instances (above and below), as you seem to introduce a pattern to
perform accesses to pdev->domain with the rwlock taken.

>              goto out;
>          }
>          ret = iommu_add_device(pdev);
> @@ -768,8 +772,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>              vpci_remove_device(pdev);
>              list_del(&pdev->domain_list);
>              pdev->domain = NULL;
> +            write_unlock(&hardware_domain->pci_lock);
>              goto out;
>          }
> +        write_unlock(&hardware_domain->pci_lock);
>      }
>      else
>          iommu_enable_device(pdev);
> @@ -812,11 +818,13 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
>          if ( pdev->bus == bus && pdev->devfn == devfn )
>          {
> +            write_lock(&pdev->domain->pci_lock);
>              vpci_remove_device(pdev);
>              pci_cleanup_msi(pdev);
>              ret = iommu_remove_device(pdev);
>              if ( pdev->domain )
>                  list_del(&pdev->domain_list);
> +            write_unlock(&pdev->domain->pci_lock);

Here you seem to protect more than strictly required, I would think
only the list_del() would need to be done holding the rwlock?

>              printk(XENLOG_DEBUG "PCI remove device %pp\n", &pdev->sbdf);
>              free_pdev(pseg, pdev);
>              break;
> @@ -887,26 +895,62 @@ static int deassign_device(struct domain *d, uint16_t 
> seg, uint8_t bus,
>  
>  int pci_release_devices(struct domain *d)
>  {
> -    struct pci_dev *pdev, *tmp;
> -    u8 bus, devfn;
> -    int ret;
> +    int combined_ret;
> +    LIST_HEAD(failed_pdevs);
>  
>      pcidevs_lock();
> -    ret = arch_pci_clean_pirqs(d);
> -    if ( ret )
> +    write_lock(&d->pci_lock);
> +    combined_ret = arch_pci_clean_pirqs(d);

Why do you need the per-domain rwlock for arch_pci_clean_pirqs()?
That function doesn't modify the per-domain pdev list.

> +    if ( combined_ret )
>      {
>          pcidevs_unlock();
> -        return ret;
> +        write_unlock(&d->pci_lock);
> +        return combined_ret;

Ideally we would like to keep the same order on unlock, so the rwlock
should be released before the pcidevs lock (unless there's a reason
not to).

>      }
> -    list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list )
> +
> +    while ( !list_empty(&d->pdev_list) )
>      {
> -        bus = pdev->bus;
> -        devfn = pdev->devfn;
> -        ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
> +        struct pci_dev *pdev = list_first_entry(&d->pdev_list,
> +                                                struct pci_dev,
> +                                                domain_list);
> +        uint16_t seg = pdev->seg;
> +        uint8_t bus = pdev->bus;
> +        uint8_t devfn = pdev->devfn;
> +        int ret;
> +
> +        write_unlock(&d->pci_lock);
> +        ret = deassign_device(d, seg, bus, devfn);
> +        write_lock(&d->pci_lock);
> +        if ( ret )
> +        {
> +            bool still_present = false;
> +            const struct pci_dev *tmp;
> +
> +            /*
> +             * We need to check if deassign_device() left our pdev in
> +             * domain's list. As we dropped the lock, we can't be sure
> +             * that list wasn't permutated in some random way, so we
> +             * need to traverse the whole list.
> +             */
> +            for_each_pdev ( d, tmp )
> +            {
> +                if ( tmp == pdev )
> +                {
> +                    still_present = true;
> +                    break;
> +                }
> +            }
> +            if ( still_present )
> +                list_move(&pdev->domain_list, &failed_pdevs);

You can get rid of the still_present variable, and just do:

for_each_pdev ( d, tmp )
    if ( tmp == pdev )
    {
        list_move(&pdev->domain_list, &failed_pdevs);
        break;
    }


> +            combined_ret = combined_ret?:ret;

Nit: missing spaces around the ternary operator.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.