[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Mon, 14 Feb 2022 13:13:05 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ZtpnDP6mG53v9dhdVWrFUPfYugizhk9vgUfzjOG5g7k=; b=ZT8YdJGIXIXHs3d/FwY/yQLlJlo+doIyEiMJ93IhGMQH8hPIKdMT0AGTO1awF3QQebvyup2elEzx7UeXkFK3UTTEx+n9EpxkNt5QKE8FhBv+Xrjw03NNW96MK7Bpp6hUxeJPuSXeDQkFoPdq31u+ryXwW0fVGVc0LvmbOMTs3RjNteIhZct4gM6RoHm0ILjYKEJij8izcKBSGsfxmxWizsEVOb5Ahr5SEeqNbDlET5uVJUBawk504hdUTyHv7Bs6XvJdHYXEVXdI8rQaQLK8WHS7BEASqqT3r/eXOizVGPMs0jYfxmNzyrqqs3HB/VFrT3jz/6Lg39B2AQfEeqGEeQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=O4Cu/3Ep5Qlf3DuUFdTQUv81hJXu3HtAIgyQvYkEKMMYQAjj8oRwrxkwZRSEN6Fv42dzQTr9+8PGCYczhD+AHf5VTEfKNx2ZxRXBwTNYFM96XwIi+aHqv96s8QgEHXhK8Yx88JTHzAtiHfcGbp4qbj261ZHfS7lcwoKJfwXOhxZR0Je7weVSY+fzy0eCRndc8cv2ABTHx1AcKiPrLWPjLG4fMVmp/ZXu54WVXYdxKd/E17DYgRWnwbcN+6s6GAC3G2w6Zq+xF0CtzY3C/KBrePkaeZ5h8cft81tQBb3CR56Sed+xU7HcvuESJt5i/U6Z1fNPPSesg8BwmxeiVdsArg==
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Delivery-date: Mon, 14 Feb 2022 13:13:17 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYHboQT3cBWYI1/EGunE7hOwop6ayM95MAgAD+qQCAAEaDAIAElIcAgAAQEwCAAAV1AIAABOQAgAABMICAAALZgIAAAzmAgAAWXYCAAARtAA==
  • Thread-topic: [PATCH] vpci: introduce per-domain lock to protect vpci structure


On 14.02.22 14:57, Jan Beulich wrote:
> On 14.02.2022 12:37, Oleksandr Andrushchenko wrote:
>>
>> On 14.02.22 13:25, Roger Pau Monné wrote:
>>> On Mon, Feb 14, 2022 at 11:15:27AM +0000, Oleksandr Andrushchenko wrote:
>>>> On 14.02.22 13:11, Roger Pau Monné wrote:
>>>>> On Mon, Feb 14, 2022 at 10:53:43AM +0000, Oleksandr Andrushchenko wrote:
>>>>>> On 14.02.22 12:34, Roger Pau Monné wrote:
>>>>>>> On Mon, Feb 14, 2022 at 09:36:39AM +0000, Oleksandr Andrushchenko wrote:
>>>>>>>> On 11.02.22 13:40, Roger Pau Monné wrote:
>>>>>>>>> +
>>>>>>>>>>>>            for ( i = 0; i < msix->max_entries; i++ )
>>>>>>>>>>>>            {
>>>>>>>>>>>>                const struct vpci_msix_entry *entry = 
>>>>>>>>>>>> &msix->entries[i];
>>>>>>>>>>> Since this function is now called with the per-domain rwlock read
>>>>>>>>>>> locked it's likely not appropriate to call process_pending_softirqs
>>>>>>>>>>> while holding such lock (check below).
>>>>>>>>>> You are right, as it is possible that:
>>>>>>>>>>
>>>>>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>>>>>>>>>
>>>>>>>>>> Even more, vpci_process_pending may also
>>>>>>>>>>
>>>>>>>>>> read_unlock -> vpci_remove_device -> write_lock
>>>>>>>>>>
>>>>>>>>>> in its error path. So, any invocation of process_pending_softirqs
>>>>>>>>>> must not hold d->vpci_rwlock at least.
>>>>>>>>>>
>>>>>>>>>> And also we need to check that pdev->vpci was not removed
>>>>>>>>>> in between or *re-created*
>>>>>>>>>>> We will likely need to re-iterate over the list of pdevs assigned to
>>>>>>>>>>> the domain and assert that the pdev is still assigned to the same
>>>>>>>>>>> domain.
>>>>>>>>>> So, do you mean a pattern like the below should be used at all
>>>>>>>>>> places where we need to call process_pending_softirqs?
>>>>>>>>>>
>>>>>>>>>> read_unlock
>>>>>>>>>> process_pending_softirqs
>>>>>>>>>> read_lock
>>>>>>>>>> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
>>>>>>>>>> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
>>>>>>>>>> <continue processing>
>>>>>>>>> Something along those lines. You likely need to continue iterate using
>>>>>>>>> for_each_pdev.
>>>>>>>> How do we tell if pdev->vpci is the same? Jan has already brought
>>>>>>>> this question before [1] and I was about to use some ID for that 
>>>>>>>> purpose:
>>>>>>>> pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id  for 
>>>>>>>> checks
>>>>>>> Given this is a debug message I would be OK with just doing the
>>>>>>> minimal checks to prevent Xen from crashing (ie: pdev->vpci exists)
>>>>>>> and that the resume MSI entry is not past the current limit. Otherwise
>>>>>>> just print a message and move on to the next device.
>>>>>> Agree, I see no big issue (probably) if we are not able to print
>>>>>>
>>>>>> How about this one:
>>>>>>
>>>>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>>>>>> index 809a6b4773e1..50373f04da82 100644
>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>> @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d, 
>>>>>> const struct pci_dev *pdev,
>>>>>>                                  struct rangeset *mem, uint16_t cmd)
>>>>>>      {
>>>>>>          struct map_data data = { .d = d, .map = true };
>>>>>> +    pci_sbdf_t sbdf = pdev->sbdf;
>>>>>>          int rc;
>>>>>>
>>>>>> + ASSERT(rw_is_write_locked(&pdev->domain->vpci_rwlock));
>>>>>> +
>>>>>>          while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) 
>>>>>> == -ERESTART )
>>>>>> +    {
>>>>>> +
>>>>>> +        /*
>>>>>> +         * process_pending_softirqs may trigger vpci_process_pending 
>>>>>> which
>>>>>> +         * may need to acquire pdev->domain->vpci_rwlock in read mode.
>>>>>> +         */
>>>>>> +        write_unlock(&pdev->domain->vpci_rwlock);
>>>>>>              process_pending_softirqs();
>>>>>> +        write_lock(&pdev->domain->vpci_rwlock);
>>>>>> +
>>>>>> +        /* Check if pdev still exists and vPCI was not removed or 
>>>>>> re-created. */
>>>>>> +        if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn) 
>>>>>> != pdev)
>>>>>> +            if ( vpci is NOT the same )
>>>>>> +            {
>>>>>> +                rc = 0;
>>>>>> +                break;
>>>>>> +            }
>>>>>> +    }
>>>>>> +
>>>>>>          rangeset_destroy(mem);
>>>>>>          if ( !rc )
>>>>>>              modify_decoding(pdev, cmd, false);
>>>>>>
>>>>>> This one also wants process_pending_softirqs to run so it *might*
>>>>>> want pdev and vpci checks. But at the same time apply_map runs
>>>>>> at ( system_state < SYS_STATE_active ), so defer_map won't be
>>>>>> running yet, thus no vpci_process_pending is possible yet (in terms
>>>>>> it has something to do yet). So, I think we just need:
>>>>>>
>>>>>>             write_unlock(&pdev->domain->vpci_rwlock);
>>>>>>             process_pending_softirqs();
>>>>>>             write_lock(&pdev->domain->vpci_rwlock);
>>>>>>
>>>>>> and this should be enough
>>>>> Given the context apply_map is called from (dom0 specific init code),
>>>>> there's no need to check for the pdev to still exits, or whether vpci
>>>>> has been recreated, as it's not possible. Just add a comment to
>>>>> explicitly note that the context of the function is special, and thus
>>>>> there's no possibility of either the device or vpci going away.
>>>> Does it really need write_unlock/write_lock given the context?...
>>> I think it's bad practice to call process_pending_softirqs while
>>> holding any locks. This is a very specific context so it's likely fine
>>> to not drop the lock, but would still seem incorrect to me.
>> Ok
>>>> I think it doesn't as there is no chance defer_map is called, thus
>>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>> Indeed, there's no chance of that because process_pending_softirqs
>>> will never try to do a scheduling operation that would result in our
>>> context being scheduled out.
>>       while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == 
>> -ERESTART )
>>       {
>>           /*
>>            * FIXME: Given the context apply_map is called from (dom0 specific
>>            * init code at system_state < SYS_STATE_active) it is not strictly
>>            * required that pdev->domain->vpci_rwlock is unlocked before 
>> calling
>>            * process_pending_softirqs as there is no contention possible 
>> between
>>            * this code and vpci_process_pending trying to acquire the lock in
>>            * read mode. But running process_pending_softirqs with any lock 
>> held
>>            * doesn't seem to be a good practice, so drop the lock and 
>> re-acquire
>>            * it right again.
>>            */
>>           write_unlock(&pdev->domain->vpci_rwlock);
>>           process_pending_softirqs();
>>           write_lock(&pdev->domain->vpci_rwlock);
>>       }
> I'm afraid that's misleading at best. apply_map() is merely a specific
> example where you know the lock is going to be taken. But really any
> softirq handler could be acquiring any lock, so requesting to process
> softirqs cannot ever be done with any lock held.
>
> What you instead want to explain is why, after re-acquiring the lock,
> no further checking is needed for potentially changed state.
How about:

/*
  * FIXME: Given the context apply_map is called from (dom0 specific
  * init code at system_state < SYS_STATE_active) there is no contention
  * possible between this code and vpci_process_pending trying to acquire
  * the lock in read mode and destroy pdev->vpci in its error path.
  * Neither pdev may be disposed yet, so it is not required to check if the
  * relevant pdev still exists after re-acquiring the lock.
  */

>
> Jan
>
Thank you,
Oleksandr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.