Xen project Mailing List

Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>

Date: Mon, 14 Feb 2022 13:27:56 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ubQF1YpH9OniJe0kqJ2IukTE0bZys3ZXuX5Sqx17TbA=; b=Q/THBrbvg5YEjMGUBjO8eJDBwBBmE8qVgrx2teEn/7/PP13Bjhq7CUVCdkJnPZS5tg1+8Ero3ESsgwhEMgowZhwvddQ6qbNCyRSqHbYN3loY22sW66CCj2d6gWZ6qk5ytpjny9UeXrCRiO0MkVEmaP7lBcRMItIxbV8/hddW7Yoa8q5RvogBj9UDUbocG6reBoMwYkw2VC5Zk9ptBKp1s9HNQ4CcTeqrHnrRvnha5RJoxxPafzalFX5sDSYGcjf/nPXCSenprrwyECLwLNxvhEGwQSW9gUhqbyCPCEgZ5+PPok+ekYK8vjUmZN023d5xttQ3iKt0Vpk46j7+9ww5yQ==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kjAtXN8Hk/DTTq09w8NoMykrOFjFMga0Z/7CqYagJ56V6oz9r3yoqXt1AuMgf+7lDlDk+5wW7XAEB/LlXTUPs2AROGuYBvhhgJjuReoggVECt1JEXDk4+FMOWuYr+cCJaiM2t6UAcxLwNWhnwZJ6PC1wXZvF0Xy0CbVWEb7l/WrZ5y/3dwXYUG68nI6+goy2E0TWCkISdAg6DOTuaKZWNkbUIinUXJEZaKLpsXVgA0HtppJXpdGOEAyfSM/Z9A4/A5EIsWTUxL+BCRA2ruvoesM78mJSfNLhGAEUpcGYAtF3btDAVfTIXxkjpPdfGtk8/3CEQ3+A+wxN5QKF8TtroA==

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>

Delivery-date: Mon, 14 Feb 2022 13:28:12 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHYHboQT3cBWYI1/EGunE7hOwop6ayM95MAgAD+qQCAAEaDAIAElIcAgAAQEwCAAAV1AIAABOQAgAABMICAAALZgIAAAzmAgAAWXYCAAARtAIAAAoIAgAABpIA=

Thread-topic: [PATCH] vpci: introduce per-domain lock to protect vpci structure

On 14.02.22 15:22, Jan Beulich wrote: > On 14.02.2022 14:13, Oleksandr Andrushchenko wrote: >> >> On 14.02.22 14:57, Jan Beulich wrote: >>> On 14.02.2022 12:37, Oleksandr Andrushchenko wrote: >>>> On 14.02.22 13:25, Roger Pau Monné wrote: >>>>> On Mon, Feb 14, 2022 at 11:15:27AM +0000, Oleksandr Andrushchenko wrote: >>>>>> On 14.02.22 13:11, Roger Pau Monné wrote: >>>>>>> On Mon, Feb 14, 2022 at 10:53:43AM +0000, Oleksandr Andrushchenko wrote: >>>>>>>> On 14.02.22 12:34, Roger Pau Monné wrote: >>>>>>>>> On Mon, Feb 14, 2022 at 09:36:39AM +0000, Oleksandr Andrushchenko >>>>>>>>> wrote: >>>>>>>>>> On 11.02.22 13:40, Roger Pau Monné wrote: >>>>>>>>>>> + >>>>>>>>>>>>>> for ( i = 0; i < msix->max_entries; i++ ) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> const struct vpci_msix_entry *entry = >>>>>>>>>>>>>> &msix->entries[i]; >>>>>>>>>>>>> Since this function is now called with the per-domain rwlock read >>>>>>>>>>>>> locked it's likely not appropriate to call >>>>>>>>>>>>> process_pending_softirqs >>>>>>>>>>>>> while holding such lock (check below). >>>>>>>>>>>> You are right, as it is possible that: >>>>>>>>>>>> >>>>>>>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock >>>>>>>>>>>> >>>>>>>>>>>> Even more, vpci_process_pending may also >>>>>>>>>>>> >>>>>>>>>>>> read_unlock -> vpci_remove_device -> write_lock >>>>>>>>>>>> >>>>>>>>>>>> in its error path. So, any invocation of process_pending_softirqs >>>>>>>>>>>> must not hold d->vpci_rwlock at least. >>>>>>>>>>>> >>>>>>>>>>>> And also we need to check that pdev->vpci was not removed >>>>>>>>>>>> in between or *re-created* >>>>>>>>>>>>> We will likely need to re-iterate over the list of pdevs assigned >>>>>>>>>>>>> to >>>>>>>>>>>>> the domain and assert that the pdev is still assigned to the same >>>>>>>>>>>>> domain. >>>>>>>>>>>> So, do you mean a pattern like the below should be used at all >>>>>>>>>>>> places where we need to call process_pending_softirqs? >>>>>>>>>>>> >>>>>>>>>>>> read_unlock >>>>>>>>>>>> process_pending_softirqs >>>>>>>>>>>> read_lock >>>>>>>>>>>> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn); >>>>>>>>>>>> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) ) >>>>>>>>>>>> <continue processing> >>>>>>>>>>> Something along those lines. You likely need to continue iterate >>>>>>>>>>> using >>>>>>>>>>> for_each_pdev. >>>>>>>>>> How do we tell if pdev->vpci is the same? Jan has already brought >>>>>>>>>> this question before [1] and I was about to use some ID for that >>>>>>>>>> purpose: >>>>>>>>>> pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id for >>>>>>>>>> checks >>>>>>>>> Given this is a debug message I would be OK with just doing the >>>>>>>>> minimal checks to prevent Xen from crashing (ie: pdev->vpci exists) >>>>>>>>> and that the resume MSI entry is not past the current limit. Otherwise >>>>>>>>> just print a message and move on to the next device. >>>>>>>> Agree, I see no big issue (probably) if we are not able to print >>>>>>>> >>>>>>>> How about this one: >>>>>>>> >>>>>>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c >>>>>>>> index 809a6b4773e1..50373f04da82 100644 >>>>>>>> --- a/xen/drivers/vpci/header.c >>>>>>>> +++ b/xen/drivers/vpci/header.c >>>>>>>> @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d, >>>>>>>> const struct pci_dev *pdev, >>>>>>>> struct rangeset *mem, uint16_t cmd) >>>>>>>> { >>>>>>>> struct map_data data = { .d = d, .map = true }; >>>>>>>> + pci_sbdf_t sbdf = pdev->sbdf; >>>>>>>> int rc; >>>>>>>> >>>>>>>> + ASSERT(rw_is_write_locked(&pdev->domain->vpci_rwlock)); >>>>>>>> + >>>>>>>> while ( (rc = rangeset_consume_ranges(mem, map_range, >>>>>>>> &data)) == -ERESTART ) >>>>>>>> + { >>>>>>>> + >>>>>>>> + /* >>>>>>>> + * process_pending_softirqs may trigger vpci_process_pending >>>>>>>> which >>>>>>>> + * may need to acquire pdev->domain->vpci_rwlock in read mode. >>>>>>>> + */ >>>>>>>> + write_unlock(&pdev->domain->vpci_rwlock); >>>>>>>> process_pending_softirqs(); >>>>>>>> + write_lock(&pdev->domain->vpci_rwlock); >>>>>>>> + >>>>>>>> + /* Check if pdev still exists and vPCI was not removed or >>>>>>>> re-created. */ >>>>>>>> + if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn) >>>>>>>> != pdev) >>>>>>>> + if ( vpci is NOT the same ) >>>>>>>> + { >>>>>>>> + rc = 0; >>>>>>>> + break; >>>>>>>> + } >>>>>>>> + } >>>>>>>> + >>>>>>>> rangeset_destroy(mem); >>>>>>>> if ( !rc ) >>>>>>>> modify_decoding(pdev, cmd, false); >>>>>>>> >>>>>>>> This one also wants process_pending_softirqs to run so it *might* >>>>>>>> want pdev and vpci checks. But at the same time apply_map runs >>>>>>>> at ( system_state < SYS_STATE_active ), so defer_map won't be >>>>>>>> running yet, thus no vpci_process_pending is possible yet (in terms >>>>>>>> it has something to do yet). So, I think we just need: >>>>>>>> >>>>>>>> write_unlock(&pdev->domain->vpci_rwlock); >>>>>>>> process_pending_softirqs(); >>>>>>>> write_lock(&pdev->domain->vpci_rwlock); >>>>>>>> >>>>>>>> and this should be enough >>>>>>> Given the context apply_map is called from (dom0 specific init code), >>>>>>> there's no need to check for the pdev to still exits, or whether vpci >>>>>>> has been recreated, as it's not possible. Just add a comment to >>>>>>> explicitly note that the context of the function is special, and thus >>>>>>> there's no possibility of either the device or vpci going away. >>>>>> Does it really need write_unlock/write_lock given the context?... >>>>> I think it's bad practice to call process_pending_softirqs while >>>>> holding any locks. This is a very specific context so it's likely fine >>>>> to not drop the lock, but would still seem incorrect to me. >>>> Ok >>>>>> I think it doesn't as there is no chance defer_map is called, thus >>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock >>>>> Indeed, there's no chance of that because process_pending_softirqs >>>>> will never try to do a scheduling operation that would result in our >>>>> context being scheduled out. >>>> while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == >>>> -ERESTART ) >>>> { >>>> /* >>>> * FIXME: Given the context apply_map is called from (dom0 >>>> specific >>>> * init code at system_state < SYS_STATE_active) it is not >>>> strictly >>>> * required that pdev->domain->vpci_rwlock is unlocked before >>>> calling >>>> * process_pending_softirqs as there is no contention possible >>>> between >>>> * this code and vpci_process_pending trying to acquire the >>>> lock in >>>> * read mode. But running process_pending_softirqs with any >>>> lock held >>>> * doesn't seem to be a good practice, so drop the lock and >>>> re-acquire >>>> * it right again. >>>> */ >>>> write_unlock(&pdev->domain->vpci_rwlock); >>>> process_pending_softirqs(); >>>> write_lock(&pdev->domain->vpci_rwlock); >>>> } >>> I'm afraid that's misleading at best. apply_map() is merely a specific >>> example where you know the lock is going to be taken. But really any >>> softirq handler could be acquiring any lock, so requesting to process >>> softirqs cannot ever be done with any lock held. >>> >>> What you instead want to explain is why, after re-acquiring the lock, >>> no further checking is needed for potentially changed state. >> How about: >> >> /* >> * FIXME: Given the context apply_map is called from (dom0 specific >> * init code at system_state < SYS_STATE_active) there is no contention >> * possible between this code and vpci_process_pending trying to acquire >> * the lock in read mode and destroy pdev->vpci in its error path. >> * Neither pdev may be disposed yet, so it is not required to check if the >> * relevant pdev still exists after re-acquiring the lock. >> */ > I'm not sure I follow the first sentence; I guess a comma or two may help, > and or using "as well as" in place of one of the two "and". I also don't > think you mean contention, but rather a race between the named entities? /* * FIXME: Given the context from which apply_map is called (dom0 specific * init code at system_state < SYS_STATE_active) there is no race condition * possible between this code and vpci_process_pending which may try to acquire * the lock in read mode and also try to destroy pdev->vpci in its error path. * Neither pdev may be disposed yet, so it is not required to check if the * relevant pdev still exists after re-acquiring the lock. */ > > Jan > Thank you, Oleksandr

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.