[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Mon, 14 Feb 2022 14:00:26 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=LH5Kg4IJ83tOpbc0TwklnadUm1djlfTwaOViIpKjrbg=; b=dmiXBCkKaq7MkdJWpIHzlFQtUQ1USkh+4RVUNqBRxReC6rcTiG1Ra/HFV5OL/VtJTk9k9aa/CzU7cmebur2Cm8g7f9Id40io9A6j2ZW3zNrUJweUjmXU7jb0w8HzQkT2x56GL81+M+pcBJxf0JakvlT9bIe29YpNepX693IWGW05g0SfzaTjpjgQe/D6eNNTgMCA/0mLpiff1pDuoamLiNN9KdX/jspTikG62tAx7TCboHDRhbof/O9f1gayBOAA20oW4OhBL1W1lu7nUp71RxMe+kUN/otnGCfBKY2FNWxchMX0wBoORTXb7r21dfDqDF1JfQ26O9HZFfWrRJ5Kxw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ChaaMQMM2DOy+wBqYPdhMigZFVjLXhLAOnj9TCIZIvLX6j5snbJYfEMGaHZErQmfJNdXX+BQbcBqrclIX+aLs3kCytR3+Ve1hroaQYDtLF7eW7f+Ktnl/rlG49CxTYa4yvUP+JfF6FUkhqBAvuBIcYJSP0UmizpDzZpyX5fEvzpCHPQ9b9lJRwS3hpwFT9NgFwowRDNsFPom9RqnKWGz6J8CAvUkDAGlJQMd213M7ccpKx9dZ4t/cCuNrrDErDc1TbK0563dHemrJ8Qb3twxDQqiGmLY1YY8KsvxxKbkgWCGX/sLGGOWXZXH/ejQ1cw7GjAO8cNR4ynktdpED3NjZg==
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Delivery-date: Mon, 14 Feb 2022 14:00:41 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYHboQT3cBWYI1/EGunE7hOwop6ayM95MAgAD+qQCAAEaDAIAElIcAgAAQEwCAAAV1AIAABOQAgAABMICAAALZgIAAAzmAgAAWXYCAAARtAIAAAoIAgAABpICAAAWvgIAAA2aA
  • Thread-topic: [PATCH] vpci: introduce per-domain lock to protect vpci structure


On 14.02.22 15:48, Jan Beulich wrote:
> On 14.02.2022 14:27, Oleksandr Andrushchenko wrote:
>>
>> On 14.02.22 15:22, Jan Beulich wrote:
>>> On 14.02.2022 14:13, Oleksandr Andrushchenko wrote:
>>>> On 14.02.22 14:57, Jan Beulich wrote:
>>>>> On 14.02.2022 12:37, Oleksandr Andrushchenko wrote:
>>>>>> On 14.02.22 13:25, Roger Pau Monné wrote:
>>>>>>> On Mon, Feb 14, 2022 at 11:15:27AM +0000, Oleksandr Andrushchenko wrote:
>>>>>>>> On 14.02.22 13:11, Roger Pau Monné wrote:
>>>>>>>>> On Mon, Feb 14, 2022 at 10:53:43AM +0000, Oleksandr Andrushchenko 
>>>>>>>>> wrote:
>>>>>>>>>> On 14.02.22 12:34, Roger Pau Monné wrote:
>>>>>>>>>>> On Mon, Feb 14, 2022 at 09:36:39AM +0000, Oleksandr Andrushchenko 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 11.02.22 13:40, Roger Pau Monné wrote:
>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>              for ( i = 0; i < msix->max_entries; i++ )
>>>>>>>>>>>>>>>>              {
>>>>>>>>>>>>>>>>                  const struct vpci_msix_entry *entry = 
>>>>>>>>>>>>>>>> &msix->entries[i];
>>>>>>>>>>>>>>> Since this function is now called with the per-domain rwlock 
>>>>>>>>>>>>>>> read
>>>>>>>>>>>>>>> locked it's likely not appropriate to call 
>>>>>>>>>>>>>>> process_pending_softirqs
>>>>>>>>>>>>>>> while holding such lock (check below).
>>>>>>>>>>>>>> You are right, as it is possible that:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Even more, vpci_process_pending may also
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> read_unlock -> vpci_remove_device -> write_lock
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> in its error path. So, any invocation of process_pending_softirqs
>>>>>>>>>>>>>> must not hold d->vpci_rwlock at least.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And also we need to check that pdev->vpci was not removed
>>>>>>>>>>>>>> in between or *re-created*
>>>>>>>>>>>>>>> We will likely need to re-iterate over the list of pdevs 
>>>>>>>>>>>>>>> assigned to
>>>>>>>>>>>>>>> the domain and assert that the pdev is still assigned to the 
>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>> domain.
>>>>>>>>>>>>>> So, do you mean a pattern like the below should be used at all
>>>>>>>>>>>>>> places where we need to call process_pending_softirqs?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> read_unlock
>>>>>>>>>>>>>> process_pending_softirqs
>>>>>>>>>>>>>> read_lock
>>>>>>>>>>>>>> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
>>>>>>>>>>>>>> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
>>>>>>>>>>>>>> <continue processing>
>>>>>>>>>>>>> Something along those lines. You likely need to continue iterate 
>>>>>>>>>>>>> using
>>>>>>>>>>>>> for_each_pdev.
>>>>>>>>>>>> How do we tell if pdev->vpci is the same? Jan has already brought
>>>>>>>>>>>> this question before [1] and I was about to use some ID for that 
>>>>>>>>>>>> purpose:
>>>>>>>>>>>> pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id  for 
>>>>>>>>>>>> checks
>>>>>>>>>>> Given this is a debug message I would be OK with just doing the
>>>>>>>>>>> minimal checks to prevent Xen from crashing (ie: pdev->vpci exists)
>>>>>>>>>>> and that the resume MSI entry is not past the current limit. 
>>>>>>>>>>> Otherwise
>>>>>>>>>>> just print a message and move on to the next device.
>>>>>>>>>> Agree, I see no big issue (probably) if we are not able to print
>>>>>>>>>>
>>>>>>>>>> How about this one:
>>>>>>>>>>
>>>>>>>>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>>>>>>>>>> index 809a6b4773e1..50373f04da82 100644
>>>>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>>>>> @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d, 
>>>>>>>>>> const struct pci_dev *pdev,
>>>>>>>>>>                                    struct rangeset *mem, uint16_t 
>>>>>>>>>> cmd)
>>>>>>>>>>        {
>>>>>>>>>>            struct map_data data = { .d = d, .map = true };
>>>>>>>>>> +    pci_sbdf_t sbdf = pdev->sbdf;
>>>>>>>>>>            int rc;
>>>>>>>>>>
>>>>>>>>>> + ASSERT(rw_is_write_locked(&pdev->domain->vpci_rwlock));
>>>>>>>>>> +
>>>>>>>>>>            while ( (rc = rangeset_consume_ranges(mem, map_range, 
>>>>>>>>>> &data)) == -ERESTART )
>>>>>>>>>> +    {
>>>>>>>>>> +
>>>>>>>>>> +        /*
>>>>>>>>>> +         * process_pending_softirqs may trigger 
>>>>>>>>>> vpci_process_pending which
>>>>>>>>>> +         * may need to acquire pdev->domain->vpci_rwlock in read 
>>>>>>>>>> mode.
>>>>>>>>>> +         */
>>>>>>>>>> +        write_unlock(&pdev->domain->vpci_rwlock);
>>>>>>>>>>                process_pending_softirqs();
>>>>>>>>>> +        write_lock(&pdev->domain->vpci_rwlock);
>>>>>>>>>> +
>>>>>>>>>> +        /* Check if pdev still exists and vPCI was not removed or 
>>>>>>>>>> re-created. */
>>>>>>>>>> +        if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, 
>>>>>>>>>> sbdf.devfn) != pdev)
>>>>>>>>>> +            if ( vpci is NOT the same )
>>>>>>>>>> +            {
>>>>>>>>>> +                rc = 0;
>>>>>>>>>> +                break;
>>>>>>>>>> +            }
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>>            rangeset_destroy(mem);
>>>>>>>>>>            if ( !rc )
>>>>>>>>>>                modify_decoding(pdev, cmd, false);
>>>>>>>>>>
>>>>>>>>>> This one also wants process_pending_softirqs to run so it *might*
>>>>>>>>>> want pdev and vpci checks. But at the same time apply_map runs
>>>>>>>>>> at ( system_state < SYS_STATE_active ), so defer_map won't be
>>>>>>>>>> running yet, thus no vpci_process_pending is possible yet (in terms
>>>>>>>>>> it has something to do yet). So, I think we just need:
>>>>>>>>>>
>>>>>>>>>>               write_unlock(&pdev->domain->vpci_rwlock);
>>>>>>>>>>               process_pending_softirqs();
>>>>>>>>>>               write_lock(&pdev->domain->vpci_rwlock);
>>>>>>>>>>
>>>>>>>>>> and this should be enough
>>>>>>>>> Given the context apply_map is called from (dom0 specific init code),
>>>>>>>>> there's no need to check for the pdev to still exits, or whether vpci
>>>>>>>>> has been recreated, as it's not possible. Just add a comment to
>>>>>>>>> explicitly note that the context of the function is special, and thus
>>>>>>>>> there's no possibility of either the device or vpci going away.
>>>>>>>> Does it really need write_unlock/write_lock given the context?...
>>>>>>> I think it's bad practice to call process_pending_softirqs while
>>>>>>> holding any locks. This is a very specific context so it's likely fine
>>>>>>> to not drop the lock, but would still seem incorrect to me.
>>>>>> Ok
>>>>>>>> I think it doesn't as there is no chance defer_map is called, thus
>>>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>>>>>> Indeed, there's no chance of that because process_pending_softirqs
>>>>>>> will never try to do a scheduling operation that would result in our
>>>>>>> context being scheduled out.
>>>>>>         while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == 
>>>>>> -ERESTART )
>>>>>>         {
>>>>>>             /*
>>>>>>              * FIXME: Given the context apply_map is called from (dom0 
>>>>>> specific
>>>>>>              * init code at system_state < SYS_STATE_active) it is not 
>>>>>> strictly
>>>>>>              * required that pdev->domain->vpci_rwlock is unlocked 
>>>>>> before calling
>>>>>>              * process_pending_softirqs as there is no contention 
>>>>>> possible between
>>>>>>              * this code and vpci_process_pending trying to acquire the 
>>>>>> lock in
>>>>>>              * read mode. But running process_pending_softirqs with any 
>>>>>> lock held
>>>>>>              * doesn't seem to be a good practice, so drop the lock and 
>>>>>> re-acquire
>>>>>>              * it right again.
>>>>>>              */
>>>>>>             write_unlock(&pdev->domain->vpci_rwlock);
>>>>>>             process_pending_softirqs();
>>>>>>             write_lock(&pdev->domain->vpci_rwlock);
>>>>>>         }
>>>>> I'm afraid that's misleading at best. apply_map() is merely a specific
>>>>> example where you know the lock is going to be taken. But really any
>>>>> softirq handler could be acquiring any lock, so requesting to process
>>>>> softirqs cannot ever be done with any lock held.
>>>>>
>>>>> What you instead want to explain is why, after re-acquiring the lock,
>>>>> no further checking is needed for potentially changed state.
>>>> How about:
>>>>
>>>> /*
>>>>     * FIXME: Given the context apply_map is called from (dom0 specific
>>>>     * init code at system_state < SYS_STATE_active) there is no contention
>>>>     * possible between this code and vpci_process_pending trying to acquire
>>>>     * the lock in read mode and destroy pdev->vpci in its error path.
>>>>     * Neither pdev may be disposed yet, so it is not required to check if 
>>>> the
>>>>     * relevant pdev still exists after re-acquiring the lock.
>>>>     */
>>> I'm not sure I follow the first sentence; I guess a comma or two may help,
>>> and or using "as well as" in place of one of the two "and". I also don't
>>> think you mean contention, but rather a race between the named entities?
>>    /*
>>     * FIXME: Given the context from which apply_map is called (dom0 specific
>>     * init code at system_state < SYS_STATE_active) there is no race 
>> condition
>>     * possible between this code and vpci_process_pending which may try to 
>> acquire
>>     * the lock in read mode and also try to destroy pdev->vpci in its error 
>> path.
>>     * Neither pdev may be disposed yet, so it is not required to check if the
>>     * relevant pdev still exists after re-acquiring the lock.
>>     */
> I'm still struggling with the language, sorry. You look to only have replaced
> "contention"? Reading it again I'd also like to mention that to me (not a
> native speaker) "Neither pdev may be ..." expresses "None of the pdev-s may
> be ...", when I think you mean "Nor may pdev be ..."
/*
* FIXME: apply_map is called from dom0 specific init code when
* system_state < SYS_STATE_active, so there is no race condition
* possible between this code and vpci_process_pending. So, neither
* vpci_process_pending may try to acquire the lock in read mode and
* also destroy pdev->vpci in its error path nor pdev may be disposed yet.
* This means that it is not required to check if the relevant pdev
* still exists after re-acquiring the lock.
*/

> Jan
>

Thank you,
Oleksandr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.