[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure
On 14.02.22 15:48, Jan Beulich wrote: > On 14.02.2022 14:27, Oleksandr Andrushchenko wrote: >> >> On 14.02.22 15:22, Jan Beulich wrote: >>> On 14.02.2022 14:13, Oleksandr Andrushchenko wrote: >>>> On 14.02.22 14:57, Jan Beulich wrote: >>>>> On 14.02.2022 12:37, Oleksandr Andrushchenko wrote: >>>>>> On 14.02.22 13:25, Roger Pau Monné wrote: >>>>>>> On Mon, Feb 14, 2022 at 11:15:27AM +0000, Oleksandr Andrushchenko wrote: >>>>>>>> On 14.02.22 13:11, Roger Pau Monné wrote: >>>>>>>>> On Mon, Feb 14, 2022 at 10:53:43AM +0000, Oleksandr Andrushchenko >>>>>>>>> wrote: >>>>>>>>>> On 14.02.22 12:34, Roger Pau Monné wrote: >>>>>>>>>>> On Mon, Feb 14, 2022 at 09:36:39AM +0000, Oleksandr Andrushchenko >>>>>>>>>>> wrote: >>>>>>>>>>>> On 11.02.22 13:40, Roger Pau Monné wrote: >>>>>>>>>>>>> + >>>>>>>>>>>>>>>> for ( i = 0; i < msix->max_entries; i++ ) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> const struct vpci_msix_entry *entry = >>>>>>>>>>>>>>>> &msix->entries[i]; >>>>>>>>>>>>>>> Since this function is now called with the per-domain rwlock >>>>>>>>>>>>>>> read >>>>>>>>>>>>>>> locked it's likely not appropriate to call >>>>>>>>>>>>>>> process_pending_softirqs >>>>>>>>>>>>>>> while holding such lock (check below). >>>>>>>>>>>>>> You are right, as it is possible that: >>>>>>>>>>>>>> >>>>>>>>>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock >>>>>>>>>>>>>> >>>>>>>>>>>>>> Even more, vpci_process_pending may also >>>>>>>>>>>>>> >>>>>>>>>>>>>> read_unlock -> vpci_remove_device -> write_lock >>>>>>>>>>>>>> >>>>>>>>>>>>>> in its error path. So, any invocation of process_pending_softirqs >>>>>>>>>>>>>> must not hold d->vpci_rwlock at least. >>>>>>>>>>>>>> >>>>>>>>>>>>>> And also we need to check that pdev->vpci was not removed >>>>>>>>>>>>>> in between or *re-created* >>>>>>>>>>>>>>> We will likely need to re-iterate over the list of pdevs >>>>>>>>>>>>>>> assigned to >>>>>>>>>>>>>>> the domain and assert that the pdev is still assigned to the >>>>>>>>>>>>>>> same >>>>>>>>>>>>>>> domain. >>>>>>>>>>>>>> So, do you mean a pattern like the below should be used at all >>>>>>>>>>>>>> places where we need to call process_pending_softirqs? >>>>>>>>>>>>>> >>>>>>>>>>>>>> read_unlock >>>>>>>>>>>>>> process_pending_softirqs >>>>>>>>>>>>>> read_lock >>>>>>>>>>>>>> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn); >>>>>>>>>>>>>> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) ) >>>>>>>>>>>>>> <continue processing> >>>>>>>>>>>>> Something along those lines. You likely need to continue iterate >>>>>>>>>>>>> using >>>>>>>>>>>>> for_each_pdev. >>>>>>>>>>>> How do we tell if pdev->vpci is the same? Jan has already brought >>>>>>>>>>>> this question before [1] and I was about to use some ID for that >>>>>>>>>>>> purpose: >>>>>>>>>>>> pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id for >>>>>>>>>>>> checks >>>>>>>>>>> Given this is a debug message I would be OK with just doing the >>>>>>>>>>> minimal checks to prevent Xen from crashing (ie: pdev->vpci exists) >>>>>>>>>>> and that the resume MSI entry is not past the current limit. >>>>>>>>>>> Otherwise >>>>>>>>>>> just print a message and move on to the next device. >>>>>>>>>> Agree, I see no big issue (probably) if we are not able to print >>>>>>>>>> >>>>>>>>>> How about this one: >>>>>>>>>> >>>>>>>>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c >>>>>>>>>> index 809a6b4773e1..50373f04da82 100644 >>>>>>>>>> --- a/xen/drivers/vpci/header.c >>>>>>>>>> +++ b/xen/drivers/vpci/header.c >>>>>>>>>> @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d, >>>>>>>>>> const struct pci_dev *pdev, >>>>>>>>>> struct rangeset *mem, uint16_t >>>>>>>>>> cmd) >>>>>>>>>> { >>>>>>>>>> struct map_data data = { .d = d, .map = true }; >>>>>>>>>> + pci_sbdf_t sbdf = pdev->sbdf; >>>>>>>>>> int rc; >>>>>>>>>> >>>>>>>>>> + ASSERT(rw_is_write_locked(&pdev->domain->vpci_rwlock)); >>>>>>>>>> + >>>>>>>>>> while ( (rc = rangeset_consume_ranges(mem, map_range, >>>>>>>>>> &data)) == -ERESTART ) >>>>>>>>>> + { >>>>>>>>>> + >>>>>>>>>> + /* >>>>>>>>>> + * process_pending_softirqs may trigger >>>>>>>>>> vpci_process_pending which >>>>>>>>>> + * may need to acquire pdev->domain->vpci_rwlock in read >>>>>>>>>> mode. >>>>>>>>>> + */ >>>>>>>>>> + write_unlock(&pdev->domain->vpci_rwlock); >>>>>>>>>> process_pending_softirqs(); >>>>>>>>>> + write_lock(&pdev->domain->vpci_rwlock); >>>>>>>>>> + >>>>>>>>>> + /* Check if pdev still exists and vPCI was not removed or >>>>>>>>>> re-created. */ >>>>>>>>>> + if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, >>>>>>>>>> sbdf.devfn) != pdev) >>>>>>>>>> + if ( vpci is NOT the same ) >>>>>>>>>> + { >>>>>>>>>> + rc = 0; >>>>>>>>>> + break; >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> rangeset_destroy(mem); >>>>>>>>>> if ( !rc ) >>>>>>>>>> modify_decoding(pdev, cmd, false); >>>>>>>>>> >>>>>>>>>> This one also wants process_pending_softirqs to run so it *might* >>>>>>>>>> want pdev and vpci checks. But at the same time apply_map runs >>>>>>>>>> at ( system_state < SYS_STATE_active ), so defer_map won't be >>>>>>>>>> running yet, thus no vpci_process_pending is possible yet (in terms >>>>>>>>>> it has something to do yet). So, I think we just need: >>>>>>>>>> >>>>>>>>>> write_unlock(&pdev->domain->vpci_rwlock); >>>>>>>>>> process_pending_softirqs(); >>>>>>>>>> write_lock(&pdev->domain->vpci_rwlock); >>>>>>>>>> >>>>>>>>>> and this should be enough >>>>>>>>> Given the context apply_map is called from (dom0 specific init code), >>>>>>>>> there's no need to check for the pdev to still exits, or whether vpci >>>>>>>>> has been recreated, as it's not possible. Just add a comment to >>>>>>>>> explicitly note that the context of the function is special, and thus >>>>>>>>> there's no possibility of either the device or vpci going away. >>>>>>>> Does it really need write_unlock/write_lock given the context?... >>>>>>> I think it's bad practice to call process_pending_softirqs while >>>>>>> holding any locks. This is a very specific context so it's likely fine >>>>>>> to not drop the lock, but would still seem incorrect to me. >>>>>> Ok >>>>>>>> I think it doesn't as there is no chance defer_map is called, thus >>>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock >>>>>>> Indeed, there's no chance of that because process_pending_softirqs >>>>>>> will never try to do a scheduling operation that would result in our >>>>>>> context being scheduled out. >>>>>> while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == >>>>>> -ERESTART ) >>>>>> { >>>>>> /* >>>>>> * FIXME: Given the context apply_map is called from (dom0 >>>>>> specific >>>>>> * init code at system_state < SYS_STATE_active) it is not >>>>>> strictly >>>>>> * required that pdev->domain->vpci_rwlock is unlocked >>>>>> before calling >>>>>> * process_pending_softirqs as there is no contention >>>>>> possible between >>>>>> * this code and vpci_process_pending trying to acquire the >>>>>> lock in >>>>>> * read mode. But running process_pending_softirqs with any >>>>>> lock held >>>>>> * doesn't seem to be a good practice, so drop the lock and >>>>>> re-acquire >>>>>> * it right again. >>>>>> */ >>>>>> write_unlock(&pdev->domain->vpci_rwlock); >>>>>> process_pending_softirqs(); >>>>>> write_lock(&pdev->domain->vpci_rwlock); >>>>>> } >>>>> I'm afraid that's misleading at best. apply_map() is merely a specific >>>>> example where you know the lock is going to be taken. But really any >>>>> softirq handler could be acquiring any lock, so requesting to process >>>>> softirqs cannot ever be done with any lock held. >>>>> >>>>> What you instead want to explain is why, after re-acquiring the lock, >>>>> no further checking is needed for potentially changed state. >>>> How about: >>>> >>>> /* >>>> * FIXME: Given the context apply_map is called from (dom0 specific >>>> * init code at system_state < SYS_STATE_active) there is no contention >>>> * possible between this code and vpci_process_pending trying to acquire >>>> * the lock in read mode and destroy pdev->vpci in its error path. >>>> * Neither pdev may be disposed yet, so it is not required to check if >>>> the >>>> * relevant pdev still exists after re-acquiring the lock. >>>> */ >>> I'm not sure I follow the first sentence; I guess a comma or two may help, >>> and or using "as well as" in place of one of the two "and". I also don't >>> think you mean contention, but rather a race between the named entities? >> /* >> * FIXME: Given the context from which apply_map is called (dom0 specific >> * init code at system_state < SYS_STATE_active) there is no race >> condition >> * possible between this code and vpci_process_pending which may try to >> acquire >> * the lock in read mode and also try to destroy pdev->vpci in its error >> path. >> * Neither pdev may be disposed yet, so it is not required to check if the >> * relevant pdev still exists after re-acquiring the lock. >> */ > I'm still struggling with the language, sorry. You look to only have replaced > "contention"? Reading it again I'd also like to mention that to me (not a > native speaker) "Neither pdev may be ..." expresses "None of the pdev-s may > be ...", when I think you mean "Nor may pdev be ..." /* * FIXME: apply_map is called from dom0 specific init code when * system_state < SYS_STATE_active, so there is no race condition * possible between this code and vpci_process_pending. So, neither * vpci_process_pending may try to acquire the lock in read mode and * also destroy pdev->vpci in its error path nor pdev may be disposed yet. * This means that it is not required to check if the relevant pdev * still exists after re-acquiring the lock. */ > Jan > Thank you, Oleksandr
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |