[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 01/12] vpci: introduce per-domain lock to protect vpci structure


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • Date: Tue, 4 Jul 2023 21:03:12 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=eQeeKVZOAR2mJnhk9tyfmSJy/ANeuIFJU7xRGovYa/Y=; b=Oaludud6SW24bK3SBgzL6nHAfLddyRRRC8xIsunarnvMAfTlmCjxyg/BzCfu/asPOuUJe+tuJQhfXXjbeIkm6l15XScym3J95l3whHstixywXBNElQvHP0crsIZWDx3WtRQ8p6jYaCXoUdRWh1gbYYV5kTbtq9HQnc90jRz6MYDC8ViYYjew6KJ2rob/vzikr49DN/rB9FE8/9/JpNl7BzDV1Kbw1Oo7ZH9F438xCa0Ez4Mn/zWCn8q9Mf/AIuwO2MfyBtMPtpxgU+GWR9ldlLfkVIKXb2wB5a9rYEKG5nAcTzHoDOrr0ko1i5e7DMAJTx/FH8WcD9WtUuLzeHX0Fg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bikOchGA1QKIIn4x0y+0Wub69MgW+r5DaeYgxpKXiPCNB3OXlEoR6I9BKOeAZ4J2KQQlAzbOB1GcFtqrMPPYohDXP4rPIfFHE7XQk63N6AjPbD0N3yFhoBTlJ5x+7eJR1/J0wnf6c8JU7cI7ER4+sSKP1EbnQ9OD9PI+D6DxZAVhDliN1AKhogipkZBaTZGVW+BS4FIoeIUkHTFV/7VclUAWdeyDVD43JMAYbNcnFOPJ0+jbuzBjIMmolfnuLI8nF1u9+a8Uf0L4dbvgM3siKM/VN/YnyyB0hEvNo/2eZOqiZdk3iVWq1vHmGNMzc/rn4+aBvPtn3thrEizJ24tWXg==
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>
  • Delivery-date: Tue, 04 Jul 2023 21:03:42 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHZneJZdDE+HOoSO0y6VB9l0x1hDq+No9OAgAgqUACAALl2AIAAzUWAgADPAYCAEhJlgA==
  • Thread-topic: [PATCH v7 01/12] vpci: introduce per-domain lock to protect vpci structure

Hi Roger,

Roger Pau Monné <roger.pau@xxxxxxxxxx> writes:

> On Thu, Jun 22, 2023 at 09:17:36PM +0000, Volodymyr Babchuk wrote:
>> 
>> Hi Roger,
>> 
>> Roger Pau Monné <roger.pau@xxxxxxxxxx> writes:
>> 
>> > On Wed, Jun 21, 2023 at 10:07:20PM +0000, Volodymyr Babchuk wrote:
>> >> 
>> >> Hi Roger,
>> >> 
>> >> Roger Pau Monné <roger.pau@xxxxxxxxxx> writes:
>> >> 
>> >> > On Tue, Jun 13, 2023 at 10:32:26AM +0000, Volodymyr Babchuk wrote:
>> >> >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>> >> >> 
>> >> >> Introduce a per-domain read/write lock to check whether vpci is 
>> >> >> present,
>> >> >> so we are sure there are no accesses to the contents of the vpci struct
>> >> >> if not. This lock can be used (and in a few cases is used right away)
>> >> >> so that vpci removal can be performed while holding the lock in write
>> >> >> mode. Previously such removal could race with vpci_read for example.
>> >> >> 
>> >> >> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure
>> >> >> from being removed.
>> >> >> 
>> >> >> 2. Writing the command register and ROM BAR register may trigger
>> >> >> modify_bars to run, which in turn may access multiple pdevs while
>> >> >> checking for the existing BAR's overlap. The overlapping check, if done
>> >> >> under the read lock, requires vpci->lock to be acquired on both devices
>> >> >> being compared, which may produce a deadlock. It is not possible to
>> >> >> upgrade read lock to write lock in such a case. So, in order to prevent
>> >> >> the deadlock, check which registers are going to be written and acquire
>> >> >> the lock in the appropriate mode from the beginning.
>> >> >> 
>> >> >> All other code, which doesn't lead to pdev->vpci destruction and does 
>> >> >> not
>> >> >> access multiple pdevs at the same time, can still use a combination of 
>> >> >> the
>> >> >> read lock and pdev->vpci->lock.
>> >> >> 
>> >> >> 3. Optimize if ROM BAR write lock required detection by caching offset
>> >> >> of the ROM BAR register in vpci->header->rom_reg which depends on
>> >> >> header's type.
>> >> >> 
>> >> >> 4. Reduce locked region in vpci_remove_device as it is now possible
>> >> >> to set pdev->vpci to NULL early right after the write lock is acquired.
>> >> >> 
>> >> >> 5. Reduce locked region in vpci_add_handlers as it is possible to
>> >> >> initialize many more fields of the struct vpci before assigning it to
>> >> >> pdev->vpci.
>> >> >> 
>> >> >> 6. vpci_{add|remove}_register are required to be called with the write 
>> >> >> lock
>> >> >> held, but it is not feasible to add an assert there as it requires
>> >> >> struct domain to be passed for that. So, add a comment about this 
>> >> >> requirement
>> >> >> to these and other functions with the equivalent constraints.
>> >> >> 
>> >> >> 7. Drop const qualifier where the new rwlock is used and this is 
>> >> >> appropriate.
>> >> >> 
>> >> >> 8. Do not call process_pending_softirqs with any locks held. For that 
>> >> >> unlock
>> >> >> prior the call and re-acquire the locks after. After re-acquiring the
>> >> >> lock there is no need to check if pdev->vpci exists:
>> >> >>  - in apply_map because of the context it is called (no race condition
>> >> >>    possible)
>> >> >>  - for MSI/MSI-X debug code because it is called at the end of
>> >> >>    pdev->vpci access and no further access to pdev->vpci is made
>> >> >> 
>> >> >> 9. Check for !pdev->vpci in vpci_{read|write} after acquiring the lock
>> >> >> and if so, allow reading or writing the hardware register directly. 
>> >> >> This is
>> >> >> acceptable as we only deal with Dom0 as of now. Once DomU support is
>> >> >> added the write will need to be ignored and read return all 0's for the
>> >> >> guests, while Dom0 can still access the registers directly.
>> >> >> 
>> >> >> 10. Introduce pcidevs_trylock, so there is a possibility to try locking
>> >> >> the pcidev's lock.
>> >> >> 
>> >> >> 11. Use pcidev's lock around for_each_pdev and pci_get_pdev_by_domain
>> >> >> while accessing pdevs in vpci code.
>> >> >> 
>> >> >> 12. This is based on the discussion at [1].
>> >> >> 
>> >> >> [1] 
>> >> >> https://urldefense.com/v3/__https://lore.kernel.org/all/20220204063459.680961-4-andr2000@xxxxxxxxx/__;!!GF_29dbcQIUBPA!zPy31CWFWlyC0xhEHiSj6rOPe7RDSjLranI9KZqhG4ssmChJMWvsPLJPQGTcVsnnowZpP8-LaKJkIWIzb8ue0DoYhg$
>> >> >>  [lore[.]kernel[.]org]
>> >> >> 
>> >> >> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>> >> >> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
>> >> >> Signed-off-by: Oleksandr Andrushchenko 
>> >> >> <oleksandr_andrushchenko@xxxxxxxx>
>> >> >
>> >> > Thanks.
>> >> >
>> >> > I haven't looked in full detail, but I'm afraid there's an ABBA
>> >> > deadlock with the per-domain vpci lock and the pcidevs lock in
>> >> > modify_bars() vs  vpci_add_handlers() and vpci_remove_device().
>> >> >
>> >> > I've made some comments below.
>> >> 
>> >> Thank you for the review. I believe that it is a good idea to have a
>> >> per-domain pdev_list lock. See my answers below.
>> >
>> > I think it's important that the lock that protects domain->pdev_list
>> > must be the same that also protects pdev->vpci, or else you might run
>> > into similar ABBA deadlock situations.
>> >
>> > The problem then could be that in vpci_{read,write} you will take the
>> > per-domain pdev lock in read mode in order to get the pdev, and for
>> > writes to the command register or the ROM BAR you would have to
>> > upgrade such lock to a write lock without dropping it, and we don't
>> > have such functionality for rw locks ATM.
>> >
>> > Maybe just re-starting the function knowing that the lock must be
>> > taken in write mode would be a good solution: writes to the command
>> > register will already be slow since they are likely to involve
>> > modifications to the p2m.
>> 
>> Looks like modify_bars() is the only cause for this extended lock. I
>> know that this was discussed earlier, but can we rework modify_bars to
>> not iterate over all the pdevs? We can store copy of all enabled BARs in
>> a domain structure, protected by domain->vpci_lock. Something akin to
>> 
>> struct vpci_bar {
>>         list_head list;
>>         struct vpci *vpci;
>>         unsigned long start;
>>         unsigned long end;
>>         bool is_rom;
>> };
>
> This IMO makes the logic more complicated, as each time a BAR is
> updated we would have to change the cached address and size in two
> different places.  It's also duplicated data that takes up memory, and
> there are system with a non-trivial amount of PCI devices and thus
> BARs to track.
>
> I think it's easier to just make the newly introduced per-domain
> rwlock also protect the domain's pdev_list (unless I'm missing
> something).  AFAICT it would also simplify locking, as such rwlock
> protecting the domain->pdev_list will avoid you from having to take
> the pcidevs lock in vpci_{read,write} in order to find the device the
> access belongs to.

I am currently implementing your proposal (along with Jan's
suggestions), but I am facing ABBA deadlock with IOMMU's
reassign_device() call, which has this piece of code:

        list_move(&pdev->domain_list, &target->pdev_list);

My immediate change was:

        write_lock(&pdev->domain->pci_lock);
        list_del(&pdev->domain_list);
        write_unlock(&pdev->domain->pci_lock);

        write_lock(&target->pci_lock);
        list_add(&pdev->domain_list, &target->pdev_list);
        write_unlock(&target->pci_lock);

But this will not work because reassign_device is called from
pci_release_devices() which iterates over d->pdev_list, so we need to
take a d->pci_lock early.

Any suggestions on how to fix this? My idea is to remove a device from a
list one at time:

int pci_release_devices(struct domain *d)
{
    struct pci_dev *pdev;
    u8 bus, devfn;
    int ret;

    pcidevs_lock();
    write_lock(&d->pci_lock);
    ret = arch_pci_clean_pirqs(d);
    if ( ret )
    {
        pcidevs_unlock();
        write_unlock(&d->pci_lock);
        return ret;
    }

    while ( !list_empty(&d->pdev_list) )
    {
        pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list);
        bus = pdev->bus;
        devfn = pdev->devfn;
        list_del(&pdev->domain_list);
        write_unlock(&d->pci_lock);
        ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
        write_lock(&d->pci_lock);
    }
    write_unlock(&d->pci_lock);
    pcidevs_unlock();

    return ret;
}

But it is ugly somewhat.


-- 
WBR, Volodymyr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.