[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v8 01/13] pci: introduce per-domain PCI rwlock


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • Date: Thu, 20 Jul 2023 22:57:10 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=FZPMfINzEVay3nlZfRNED2DDJH0obTHCi8A4uNLqF1g=; b=XC9Ye++0IMZjT8QHXoFqFRBnbF4ZDRrV8ieWBYPulyhy9IJd2Uqkywel45AF4xM+IBQhUk7BURQ3wDFQUqJKeSx65bsKNnKzCSMmu+1F4OKlZYqdduBStqE5eRsj4Vi2sXrjQqF67mkn8oP8StSnWTYeyyiJM6tRUhL57f7J+RHa5PXiwM3WKOK/eEvXBDrteXD85xggoSkqm9g4G1325RjhdH9wbf0YZD46vmD/pvs2PqlzsTwrOgj1CQQdEpZAqRzZ6xrqeMufzFXqFcOFj995PugB+hOnIwXbFvk5G38IGue/zmC0UHKxQA/qzv45nQngnJfSFT5wXqit9y+ouA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IGe7AEADtNv+vXawAdGBElvExWs7y0xxso5OkNWaplvAEazgSYvaSGsxzY5BOO7Bs/4SOY+4f5JqHKlLZriu4mXtfHwdlaLD8iSDM+7YChO0qdu7lvPTgP+a2DTwxOEnU+XgpMrHRvDCuJaG8umf2NCjBJEIPvEHi46OXD6sFzs+dPjDHBBGrRgEJypDcjw+o35SAq19CbLK42LFcEntCS0r+uLmZGrU9BlRgS3SC/42IqVvygXfIRjIzuFXiLeh1et5jkQaFCz4Wg9lHD3dj0YOMyQ2z5G5w9m0Zm4/UyybJDOjxgQLNWNl1MglZfguKcew9n6qANV+Y6pgdTaQ3w==
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
  • Delivery-date: Thu, 20 Jul 2023 22:57:38 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHZuqGr3g/gocvkCU2kdSqq613Aoa/CaIaAgADZI4A=
  • Thread-topic: [PATCH v8 01/13] pci: introduce per-domain PCI rwlock

Hi Roger,

Roger Pau Monné <roger.pau@xxxxxxxxxx> writes:

> On Thu, Jul 20, 2023 at 12:32:31AM +0000, Volodymyr Babchuk wrote:
>> Add per-domain d->pci_lock that protects access to
>> d->pdev_list. Purpose of this lock is to give guarantees to VPCI code
>> that underlying pdev will not disappear under feet. This is a rw-lock,
>> but this patch adds only write_lock()s. There will be read_lock()
>> users in the next patches.
>> 
>> This lock should be taken in write mode every time d->pdev_list is
>> altered. This covers both accesses to d->pdev_list and accesses to
>> pdev->domain_list fields. All write accesses also should be protected
>> by pcidevs_lock() as well. Idea is that any user that wants read
>> access to the list or to the devices stored in the list should use
>> either this new d->pci_lock or old pcidevs_lock(). Usage of any of
>> this two locks will ensure only that pdev of interest will not
>> disappear from under feet and that the pdev still will be assigned to
>> the same domain. Of course, any new users should use pcidevs_lock()
>> when it is appropriate (e.g. when accessing any other state that is
>> protected by the said lock).
>
> I think this needs a note about the ordering:
>
> "In case both the newly introduced per-domain rwlock and the pcidevs
> lock is taken, the later must be acquired first."

Thanks. Added.

>> 
>> Any write access to pdev->domain_list should be protected by both
>> pcidevs_lock() and d->pci_lock in the write mode.
>
> You also protect calls to vpci_remove_device() with the per-domain
> pci_lock it seems, and that will need some explanation as it's not
> obvious.

Well, strictly speaking, it is not required in this patch. But it is
needed in the next one. I can lock only "list_del(&pdev->domain_list);"
end extend then locked area in the next patch. On other hand, this patch
already protects vpci_add_handlers() call in the pci_add_device() due to
the code layout, so it may be natural to protect vpci_remove_device() as
well. What is your opinion?

>> 
>> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
>> 
>> ---
>> 
>> Changes in v8:
>>  - New patch
>> 
>> Changes in v8 vs RFC:
>>  - Removed all read_locks after discussion with Roger in #xendevel
>>  - pci_release_devices() now returns the first error code
>>  - extended commit message
>>  - added missing lock in pci_remove_device()
>>  - extended locked region in pci_add_device() to protect list_del() calls
>> ---
>>  xen/common/domain.c                         |  1 +
>>  xen/drivers/passthrough/amd/pci_amd_iommu.c |  9 ++-
>>  xen/drivers/passthrough/pci.c               | 68 +++++++++++++++++----
>>  xen/drivers/passthrough/vtd/iommu.c         |  9 ++-
>>  xen/include/xen/sched.h                     |  1 +
>>  5 files changed, 74 insertions(+), 14 deletions(-)
>> 
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index caaa402637..5d8a8836da 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -645,6 +645,7 @@ struct domain *domain_create(domid_t domid,
>>  
>>  #ifdef CONFIG_HAS_PCI
>>      INIT_LIST_HEAD(&d->pdev_list);
>> +    rwlock_init(&d->pci_lock);
>>  #endif
>>  
>>      /* All error paths can depend on the above setup. */
>> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
>> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> index 94e3775506..e2f2e2e950 100644
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -476,8 +476,13 @@ static int cf_check reassign_device(
>>  
>>      if ( devfn == pdev->devfn && pdev->domain != target )
>>      {
>> -        list_move(&pdev->domain_list, &target->pdev_list);
>> -        pdev->domain = target;
>
> You seem to have inadvertently dropped the above line? (and so devices
> would keep the previous pdev->domain value)
>

Oops, yes. Thank you. I was testing those patches on Intel machine, so
AMD part left not verified.

>> +        write_lock(&pdev->domain->pci_lock);
>> +        list_del(&pdev->domain_list);
>> +        write_unlock(&pdev->domain->pci_lock);
>> +
>> +        write_lock(&target->pci_lock);
>> +        list_add(&pdev->domain_list, &target->pdev_list);
>> +        write_unlock(&target->pci_lock);
>>      }
>>  
>>      /*
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index 95846e84f2..5b4632ead2 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -454,7 +454,9 @@ static void __init _pci_hide_device(struct pci_dev *pdev)
>>      if ( pdev->domain )
>>          return;
>>      pdev->domain = dom_xen;
>> +    write_lock(&dom_xen->pci_lock);
>>      list_add(&pdev->domain_list, &dom_xen->pdev_list);
>> +    write_unlock(&dom_xen->pci_lock);
>>  }
>>  
>>  int __init pci_hide_device(unsigned int seg, unsigned int bus,
>> @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>      ret = 0;
>>      if ( !pdev->domain )
>>      {
>> +        write_lock(&hardware_domain->pci_lock);
>>          pdev->domain = hardware_domain;
>>          list_add(&pdev->domain_list, &hardware_domain->pdev_list);
>>  
>> @@ -760,6 +763,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>              printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
>>              list_del(&pdev->domain_list);
>>              pdev->domain = NULL;
>> +            write_unlock(&hardware_domain->pci_lock);
>
> Strictly speaking, this could move one line earlier, as accesses to
> pdev->domain are not protected by the d->pci_lock?  Same in other
> instances (above and below), as you seem to introduce a pattern to
> perform accesses to pdev->domain with the rwlock taken.
>

Yes, you are right. I'll move the unlock() call.

>>              goto out;
>>          }
>>          ret = iommu_add_device(pdev);
>> @@ -768,8 +772,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>              vpci_remove_device(pdev);
>>              list_del(&pdev->domain_list);
>>              pdev->domain = NULL;
>> +            write_unlock(&hardware_domain->pci_lock);
>>              goto out;
>>          }
>> +        write_unlock(&hardware_domain->pci_lock);
>>      }
>>      else
>>          iommu_enable_device(pdev);
>> @@ -812,11 +818,13 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
>>          if ( pdev->bus == bus && pdev->devfn == devfn )
>>          {
>> +            write_lock(&pdev->domain->pci_lock);
>>              vpci_remove_device(pdev);
>>              pci_cleanup_msi(pdev);
>>              ret = iommu_remove_device(pdev);
>>              if ( pdev->domain )
>>                  list_del(&pdev->domain_list);
>> +            write_unlock(&pdev->domain->pci_lock);
>
> Here you seem to protect more than strictly required, I would think
> only the list_del() would need to be done holding the rwlock?
>

Yes, I believe this is a spill from a next patch. At first all those
changes were introduced in "vpci: use per-domain PCI lock to protect
vpci structure", but then I decided to split changes into two patches.

>>              printk(XENLOG_DEBUG "PCI remove device %pp\n", &pdev->sbdf);
>>              free_pdev(pseg, pdev);
>>              break;
>> @@ -887,26 +895,62 @@ static int deassign_device(struct domain *d, uint16_t 
>> seg, uint8_t bus,
>>  
>>  int pci_release_devices(struct domain *d)
>>  {
>> -    struct pci_dev *pdev, *tmp;
>> -    u8 bus, devfn;
>> -    int ret;
>> +    int combined_ret;
>> +    LIST_HEAD(failed_pdevs);
>>  
>>      pcidevs_lock();
>> -    ret = arch_pci_clean_pirqs(d);
>> -    if ( ret )
>> +    write_lock(&d->pci_lock);
>> +    combined_ret = arch_pci_clean_pirqs(d);
>
> Why do you need the per-domain rwlock for arch_pci_clean_pirqs()?
> That function doesn't modify the per-domain pdev list.

You are right, I will correct this in the next version.

>
>> +    if ( combined_ret )
>>      {
>>          pcidevs_unlock();
>> -        return ret;
>> +        write_unlock(&d->pci_lock);
>> +        return combined_ret;
>
> Ideally we would like to keep the same order on unlock, so the rwlock
> should be released before the pcidevs lock (unless there's a reason
> not to).

I'll move write_lock() further below, so this will be fixed automatically.

>
>>      }
>> -    list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list )
>> +
>> +    while ( !list_empty(&d->pdev_list) )
>>      {
>> -        bus = pdev->bus;
>> -        devfn = pdev->devfn;
>> -        ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
>> +        struct pci_dev *pdev = list_first_entry(&d->pdev_list,
>> +                                                struct pci_dev,
>> +                                                domain_list);
>> +        uint16_t seg = pdev->seg;
>> +        uint8_t bus = pdev->bus;
>> +        uint8_t devfn = pdev->devfn;
>> +        int ret;
>> +
>> +        write_unlock(&d->pci_lock);
>> +        ret = deassign_device(d, seg, bus, devfn);
>> +        write_lock(&d->pci_lock);
>> +        if ( ret )
>> +        {
>> +            bool still_present = false;
>> +            const struct pci_dev *tmp;
>> +
>> +            /*
>> +             * We need to check if deassign_device() left our pdev in
>> +             * domain's list. As we dropped the lock, we can't be sure
>> +             * that list wasn't permutated in some random way, so we
>> +             * need to traverse the whole list.
>> +             */
>> +            for_each_pdev ( d, tmp )
>> +            {
>> +                if ( tmp == pdev )
>> +                {
>> +                    still_present = true;
>> +                    break;
>> +                }
>> +            }
>> +            if ( still_present )
>> +                list_move(&pdev->domain_list, &failed_pdevs);
>
> You can get rid of the still_present variable, and just do:
>
> for_each_pdev ( d, tmp )
>     if ( tmp == pdev )
>     {
>         list_move(&pdev->domain_list, &failed_pdevs);
>       break;
>     }
>
>

Yep, thanks.


-- 
WBR, Volodymyr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.