[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 01/12] vpci: introduce per-domain lock to protect vpci structure


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • Date: Fri, 7 Jul 2023 02:02:52 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=21W7mRXhPafIDi9LiL+vr2WALcOTea1NN1dG8a7Xoq4=; b=jM4rWILkYObEiVEqsv366+ye/Rn1rFy6RIL9OUGtKlZn2l43bN34pAzRje2pw4tQW19P7QpQKhhEtRhrzAtOjKrYjureHH2/6gtDaNjjuJD0kNPxtQ2ubt1RG4cbytnRLQLqaFmSWyTCxN1W7qLQs0sgKLRlfGuk90/SKVwEIKTn49z+SyMixPGmX+LKN4AyHFOUp4lfXF72kwLkuOImCGLiQ3TPvlcA/x/MjRYhvjtDuZcxn3sI9rlWv4Pr19ENs5SE8zQQJRKHD1JS7I5hh+RriCCoSMOJpkGcMQpqALlEF1VGfMMzmq1S6Bv5O7E/ETt36UT7AXPl24YSGF2O2g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YcfhBYcHBvv5GKr+xtACBK6RnonRoM/cYy2inSoMV9vFgmiHhB3ub1ekkJeR6kAXaj+I0jIS9Kf8QSyhHlzfLBsIKbUSSplwwlIEZbwqH6xFo8HBUsuV7GcMH5R3TdA86ls98JkLdo9xTAHi0nuMIOR6KifC5wmlffML3oJAsyIeVC3lmyaMOnhIo4kZTHFG0e0iUGvc69QUgdCiItkd/Gk5sbbKRYEiGvfdpV3FrMWVPbEXwnNXNPQJMnJb5gGUfGkI1C2CIfV72EMVFjWxLgtBDluiO4XhPDCDmn1n8Ida5qNEuQ7aSt+2/VkxqIeXygxLCq0LCAjkFdohqOIiMA==
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>
  • Delivery-date: Fri, 07 Jul 2023 02:03:16 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHZneJZdDE+HOoSO0y6VB9l0x1hDq+No9OAgAgqUACAALl2AIAAzUWAgADPAYCAEhJlgIAArbcAgAAeJQCAAAPogIACp8SA
  • Thread-topic: [PATCH v7 01/12] vpci: introduce per-domain lock to protect vpci structure

Hi Jan,

Jan Beulich <jbeulich@xxxxxxxx> writes:

> On 05.07.2023 10:59, Roger Pau Monné wrote:
>> On Wed, Jul 05, 2023 at 09:11:10AM +0200, Jan Beulich wrote:
>>> On 04.07.2023 23:03, Volodymyr Babchuk wrote:
>>>> I am currently implementing your proposal (along with Jan's
>>>> suggestions), but I am facing ABBA deadlock with IOMMU's
>>>> reassign_device() call, which has this piece of code:
>>>>
>>>>         list_move(&pdev->domain_list, &target->pdev_list);
>>>>
>>>> My immediate change was:
>>>>
>>>>         write_lock(&pdev->domain->pci_lock);
>>>>         list_del(&pdev->domain_list);
>>>>         write_unlock(&pdev->domain->pci_lock);
>>>>
>>>>         write_lock(&target->pci_lock);
>>>>         list_add(&pdev->domain_list, &target->pdev_list);
>>>>         write_unlock(&target->pci_lock);
>>>>
>>>> But this will not work because reassign_device is called from
>>>> pci_release_devices() which iterates over d->pdev_list, so we need to
>>>> take a d->pci_lock early.
>>>>
>>>> Any suggestions on how to fix this? My idea is to remove a device from a
>>>> list one at time:
>>>>
>>>> int pci_release_devices(struct domain *d)
>>>> {
>>>>     struct pci_dev *pdev;
>>>>     u8 bus, devfn;
>>>>     int ret;
>>>>
>>>>     pcidevs_lock();
>>>>     write_lock(&d->pci_lock);
>>>>     ret = arch_pci_clean_pirqs(d);
>>>>     if ( ret )
>>>>     {
>>>>         pcidevs_unlock();
>>>>         write_unlock(&d->pci_lock);
>>>>         return ret;
>>>>     }
>>>>
>>>>     while ( !list_empty(&d->pdev_list) )
>>>>     {
>>>>         pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list);
>>>>         bus = pdev->bus;
>>>>         devfn = pdev->devfn;
>>>>         list_del(&pdev->domain_list);
>>>>         write_unlock(&d->pci_lock);
>>>>         ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
>>>>         write_lock(&d->pci_lock);
>>>
>>> I think it needs doing almost like this, but with two more tweaks and
>>> no list_del() right here (first and foremost to avoid needing to
>>> figure whether removing early isn't going to subtly break anything;
>>> see below for an error case that would end up with changed behavior):
>>>
>>>     while ( !list_empty(&d->pdev_list) )
>>>     {
>>>         const struct pci_dev *pdev = list_first_entry(&d->pdev_list, struct 
>>> pci_dev, domain_list);
>>>         uint16_t seg = pdev->seg;
>>>         uint8_t bus = pdev->bus;
>>>         uint8_t devfn = pdev->devfn;
>>>
>>>         write_unlock(&d->pci_lock);
>> 
>> I think you need to remove the device from the pdev_list before
>> dropping the lock, or else release could race with other operations.
>> 
>> That's unlikely, but still if the lock is dropped and the routine
>> needs to operate on the device it is better remove such device from
>> the domain so other operations cannot get a reference to it.
>> 
>> Otherwise you could modify reassign_device() implementations so they
>> require the caller to hold the source->pci_lock when calling the
>> routine, but that's ugly because the lock would need to be dropped in
>> order to reassign the device from source to target domains.
>> 
>> Another option would be to move the whole d->pdev_list to a local
>> variable (so that d->pdev_list would be empty) and then iterate over
>> it without the d->pci_lock.  On failure you would take the lock and
>> add the failing device back into d->pdev_list.
>
> Conceptually I like this last variant, but like the individual list_del()
> it requires auditing code for no dependency on the device still being on
> that list. In fact deassign_device()'s use of pci_get_pdev() does. The
> function would then need changing to have struct pci_dev * passed in.
> Yet who knows where else there are uses of pci_get_pdev() lurking.

Okay, so I changed deassign_device() signature and reworked the loop
in pci_release_devices() in a such way:

    INIT_LIST_HEAD(&tmp_list);
    /* Move all entries to tmp_list, so we can drop d->pci_lock */
    list_splice_init(&d->pdev_list, &tmp_list);
    write_unlock(&d->pci_lock);

    list_for_each_entry_safe ( pdev, tmp, &tmp_list, domain_list )
    {
        pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list);
        rc = deassign_device(d, pdev);
        if ( rc )
        {
            /* Return device back to the domain list */
            write_lock(&d->pci_lock);
            list_add(&pdev->domain_list, &d->pdev_list);
            write_unlock(&d->pci_lock);
            func_ret = rc;
        }
    }


Also, I checked for all pci_get_pdev() calls and found that struct
domain (the first parameter) is passed only in handful of places:

*** xen/drivers/vpci/vpci.c:
vpci_read[504]                 pdev = pci_get_pdev(d, sbdf);
vpci_read[506]                 pdev = pci_get_pdev(dom_xen, sbdf);
vpci_write[621]                pdev = pci_get_pdev(d, sbdf);
vpci_write[623]                pdev = pci_get_pdev(dom_xen, sbdf);

*** xen/arch/x86/irq.c:
map_domain_pirq[2166]          pdev = pci_get_pdev(d, msi->sbdf);

*** xen/drivers/passthrough/pci.c:
XEN_GUEST_HANDLE_PARAM[1712]   pdev = pci_get_pdev(d, machine_sbdf);

The last one is due to my change to deassign_device() signature.

==============================

d->pdev_list can be accessed there:

*** xen/drivers/passthrough/amd/pci_amd_iommu.c:
reassign_device[489]           list_add(&pdev->domain_list, &target->pdev_list);

*** xen/drivers/passthrough/pci.c:
_pci_hide_device[463]          list_add(&pdev->domain_list, 
&dom_xen->pdev_list);
pci_get_pdev[561]              list_for_each_entry ( pdev, &d->pdev_list, 
domain_list )
pci_add_device[759]            list_add(&pdev->domain_list, 
&hardware_domain->pdev_list);
pci_release_devices[917]       list_splice_init(&d->pdev_list, &tmp_list);
pci_release_devices[922]       pdev = list_entry(&d->pdev_list, struct pci_dev, 
domain_list);
pci_release_devices[928]       list_add(&pdev->domain_list, &d->pdev_list);
_setup_hwdom_pci_devices[1155] list_add(&pdev->domain_list, 
&ctxt->d->pdev_list);

*** xen/drivers/passthrough/vtd/iommu.c:
reassign_device_ownership[2819] list_add(&pdev->domain_list, 
&target->pdev_list);

*** xen/include/xen/pci.h:
for_each_pdev[149]             list_for_each_entry(pdev, &(domain)->pdev_list, 
domain_list)
has_arch_pdevs[151]            #define has_arch_pdevs(d) 
(!list_empty(&(d)->pdev_list))

==============================

And has_arch_pdevs() is used there:

*** xen/arch/x86/hvm/hvm.c:
hvm_set_cr0[2388]              has_arch_pdevs(d)) )

*** xen/arch/x86/hvm/vmx/vmcs.c:
vmx_do_resume[1892]            if ( has_arch_pdevs(v->domain) && !iommu_snoop

*** xen/arch/x86/mm.c:
l1_disallow_mask[172]          !has_arch_pdevs(d) && \

*** xen/arch/x86/mm/p2m-pod.c:
p2m_pod_set_mem_target[352]    if ( has_arch_pdevs(d) || 
cache_flush_permitted(d) )
guest_physmap_mark_populate_on_demand[1404] if ( has_arch_pdevs(d) || 
cache_flush_permitted(d) )

*** xen/arch/x86/mm/paging.c:
paging_log_dirty_enable[208]   if ( has_arch_pdevs(d) )

*** xen/drivers/passthrough/vtd/iommu.c:
reassign_device_ownership[2773] if ( !has_arch_pdevs(target) )
reassign_device_ownership[2807] if ( !has_arch_pdevs(target) )
reassign_device_ownership[2825] if ( !has_arch_pdevs(source) )


has_arch_pdevs() bothers me most, actually, because it is not always
obvious how to add locking for the callers. I am planning to rework it
in the following way:

#define has_arch_pdevs_unlocked(d) (!list_empty(&(d)->pdev_list))

static inline bool has_arch_pdevs(struct domain *d)
{
    bool ret;

    read_lock(&d->pci_lock);
    ret = has_arch_pdevs_unlocked(d);
    read_unlock(&d->pci_lock);

    return ret;
}

And then use appropriate macro/function depending on circumstances.


-- 
WBR, Volodymyr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.