[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 01/12] vpci: introduce per-domain lock to protect vpci structure


  • To: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 10 Jul 2023 08:20:08 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=iPH02SSkTDOpBwnH7L7MImm6UfRFH1sdvzqPO8HquIk=; b=H9H17r/+nmDpjNixwG671ZON6HHlag7Jle9i/m33NZJNJew9+Qp1CF6PazjNPHuZ05c67tzWHIeKp6TF32WsCn9623jGlYJ5LDzrFc/s0PPGcdwH0lRggTYclFMuhwnMyzROVeSnUoDkuy44zli9QJZPbtHntmLEKBSUeLfIBkiUazmU5cRtdwLdzFyeSSThbslrr8vecjIGgM2aXweGomd7ufP5Y4DnH8z4b/iloJmBnqaUiIT0cAp1wP9IfO/OriiM+SL9w7v9j7iR0Bf5g9E+HgsUo6R077r5qT+K+nr8XkClWcD5ZwNLVU67CPmeTKQwVbmC9rNzQzE0OLZBfA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ObGjFPmS1yPE28LRMECfuqOHnX2J4Ousxx1q0Ya+pljHGE9au2QKvh+MBh4yquOc0qEzdKeHYAxquYsuwOaAfexwNXFSLjGDQBBRbhpjx9PIEbjVGrguGWJVOg1Zc3ij/84vKrPzo6K314l4d331X76YAV5kUExrzzJ5n0XpYgFc8+fm6TNu1S6VaNRhKY33cdq/Ynccq5SF6+osuDUeTrLsl6z7IEd2wc1Yilua5Pc54cOuS+OnQUavqrrm0R1lyUzdoj5jVepOrlKX9i2lLu2DPGFGAy0UshZ/9cLN0pDPgqIPhLEMqTWOu34urNt8NGu5GHAlEPBAO7RJok4F0w==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Mon, 10 Jul 2023 06:20:24 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 10.07.2023 00:41, Volodymyr Babchuk wrote:
> 
> Hi Jan,
> 
> Jan Beulich <jbeulich@xxxxxxxx> writes:
> 
>> On 04.07.2023 23:03, Volodymyr Babchuk wrote:
>>> I am currently implementing your proposal (along with Jan's
>>> suggestions), but I am facing ABBA deadlock with IOMMU's
>>> reassign_device() call, which has this piece of code:
>>>
>>>         list_move(&pdev->domain_list, &target->pdev_list);
>>>
>>> My immediate change was:
>>>
>>>         write_lock(&pdev->domain->pci_lock);
>>>         list_del(&pdev->domain_list);
>>>         write_unlock(&pdev->domain->pci_lock);
>>>
>>>         write_lock(&target->pci_lock);
>>>         list_add(&pdev->domain_list, &target->pdev_list);
>>>         write_unlock(&target->pci_lock);
>>>
>>> But this will not work because reassign_device is called from
>>> pci_release_devices() which iterates over d->pdev_list, so we need to
>>> take a d->pci_lock early.
>>>
>>> Any suggestions on how to fix this? My idea is to remove a device from a
>>> list one at time:
>>>
>>> int pci_release_devices(struct domain *d)
>>> {
>>>     struct pci_dev *pdev;
>>>     u8 bus, devfn;
>>>     int ret;
>>>
>>>     pcidevs_lock();
>>>     write_lock(&d->pci_lock);
>>>     ret = arch_pci_clean_pirqs(d);
>>>     if ( ret )
>>>     {
>>>         pcidevs_unlock();
>>>         write_unlock(&d->pci_lock);
>>>         return ret;
>>>     }
>>>
>>>     while ( !list_empty(&d->pdev_list) )
>>>     {
>>>         pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list);
>>>         bus = pdev->bus;
>>>         devfn = pdev->devfn;
>>>         list_del(&pdev->domain_list);
>>>         write_unlock(&d->pci_lock);
>>>         ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
>>>         write_lock(&d->pci_lock);
>>
>> I think it needs doing almost like this, but with two more tweaks and
>> no list_del() right here (first and foremost to avoid needing to
>> figure whether removing early isn't going to subtly break anything;
>> see below for an error case that would end up with changed behavior):
>>
>>     while ( !list_empty(&d->pdev_list) )
>>     {
>>         const struct pci_dev *pdev = list_first_entry(&d->pdev_list, struct 
>> pci_dev, domain_list);
>>         uint16_t seg = pdev->seg;
>>         uint8_t bus = pdev->bus;
>>         uint8_t devfn = pdev->devfn;
>>
>>         write_unlock(&d->pci_lock);
>>         ret = deassign_device(d, seg, bus, devfn) ?: ret;
>>         write_lock(&d->pci_lock);
>>     }
>>
>> One caveat though: The original list_for_each_entry_safe() guarantees
>> the loop to complete; your use of list_del() would guarantee that too,
>> but then the device wouldn't be on the list anymore if deassign_device()
>> failed. Therefore I guess you will need another local list where you
>> (temporarily) put all the devices which deassign_device() left on the
>> list, and which you would then move back to d->pdev_list after the loop
>> has finished.
> 
> Okay, taking into the account all that you wrote, I came with this
> implementation:

Looks plausible at the first glance, but will of course need looking at
in more detail in the context of the full patch. Just one (largely)
cosmetic remark right away:

> int pci_release_devices(struct domain *d)
> {
>     int combined_ret;
>     LIST_HEAD(failed_pdevs);
> 
>     pcidevs_lock();
>     write_lock(&d->pci_lock);
>     combined_ret = arch_pci_clean_pirqs(d);
>     if ( combined_ret )
>     {
>         pcidevs_unlock();
>         write_unlock(&d->pci_lock);
>         return combined_ret;
>     }
> 
>     while ( !list_empty(&d->pdev_list) )
>     {
>         struct pci_dev *pdev = list_first_entry(&d->pdev_list,
>                                                 struct pci_dev,
>                                                 domain_list);
>         uint16_t seg = pdev->seg;
>         uint8_t bus = pdev->bus;
>         uint8_t devfn = pdev->devfn;
>         int ret;
> 
>         write_unlock(&d->pci_lock);
>         ret = deassign_device(d, seg, bus, devfn);
>         write_lock(&d->pci_lock);
>         if ( ret )
>         {
>             bool still_present = false;
>             const struct pci_dev *tmp;
> 
>             for_each_pdev ( d, tmp )
>             {
>                 if ( tmp == (const struct pci_dev*)pdev )

As I'm sure I expressed earlier, casts should be avoided whenever
possible. I see no reason for one here: Comparing pointer-to-
const-<type> with pointer-to-<type> is permitted by the language.

>                 {
>                     still_present = true;
>                     break;
>                 }
>             }
>             if ( still_present )
>                 list_move(&pdev->domain_list, &failed_pdevs);
>             combined_ret = ret;
>         }
>     }
> 
>     list_splice(&failed_pdevs, &d->pdev_list);
>     write_unlock(&d->pci_lock);
>     pcidevs_unlock();
> 
>     return combined_ret;
> }
> 
> 
>> (Whether it is sufficient to inspect the list head to
>> determine whether the pdev is still on the list needs careful checking.)
> 
> I am not sure that this is enough. We dropped d->pci_lock so we can
> expect that the list was permutated in a random way.

Right, if that cannot be excluded for sure, then better stay on the safe
side for now. A code comment would be nice though ahead of the
for_each_pdev().

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.