[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal
On 22.11.2021 15:21, Oleksandr Andrushchenko wrote: > On 19.11.21 15:34, Oleksandr Andrushchenko wrote: >> On 19.11.21 15:25, Jan Beulich wrote: >>> On 19.11.2021 14:16, Oleksandr Andrushchenko wrote: >>>> On 19.11.21 15:00, Jan Beulich wrote: >>>>> On 19.11.2021 13:34, Oleksandr Andrushchenko wrote: >>>>>> Possible locking and other work needed: >>>>>> ======================================= >>>>>> >>>>>> 1. pcidevs_{lock|unlock} is too heavy and is per-host >>>>>> 2. pdev->vpci->lock cannot be used as vpci is freed by vpci_remove_device >>>>>> 3. We may want a dedicated per-domain rw lock to be implemented: >>>>>> >>>>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h >>>>>> index 28146ee404e6..ebf071893b21 100644 >>>>>> --- a/xen/include/xen/sched.h >>>>>> +++ b/xen/include/xen/sched.h >>>>>> @@ -444,6 +444,7 @@ struct domain >>>>>> >>>>>> #ifdef CONFIG_HAS_PCI >>>>>> struct list_head pdev_list; >>>>>> + rwlock_t vpci_rwlock; >>>>>> + bool vpci_terminating; <- atomic? >>>>>> #endif >>>>>> then vpci_remove_device is a writer (cold path) and vpci_process_pending >>>>>> and >>>>>> vpci_mmio_{read|write} are readers (hot path). >>>>> Right - you need such a lock for other purposes anyway, as per the >>>>> discussion with Julien. >>>> What about bool vpci_terminating? Do you see it as an atomic type or just >>>> bool? >>> Having seen only ... >>> >>>>>> do_physdev_op(PHYSDEVOP_pci_device_remove) will need >>>>>> hypercall_create_continuation >>>>>> to be implemented, so when re-start removal if need be: >>>>>> >>>>>> vpci_remove_device() >>>>>> { >>>>>> d->vpci_terminating = true; >>> ... this use so far, I can't tell yet. But at a first glance a boolean >>> looks to be what you need. >>> >>>>>> remove vPCI register handlers <- this will cut off PCI_COMMAND >>>>>> emulation among others >>>>>> if ( !write_trylock(d->vpci_rwlock) ) >>>>>> return -ERESTART; >>>>>> xfree(pdev->vpci); >>>>>> pdev->vpci = NULL; >>>>>> } >>>>>> >>>>>> Then this d->vpci_rwlock becomes a dedicated vpci per-domain lock for >>>>>> other operations which may require it, e.g. virtual bus topology can >>>>>> use it when assigning vSBDF etc. >>>>>> >>>>>> 4. vpci_remove_device needs to be removed from vpci_process_pending >>>>>> and do nothing for Dom0 and crash DomU otherwise: >>>>> Why is this? I'm not outright opposed, but I don't immediately see why >>>>> trying to remove the problematic device wouldn't be a reasonable course >>>>> of action anymore. vpci_remove_device() may need to become more careful >>>>> as to not crashing, >>>> vpci_remove_device does not crash, vpci_process_pending does >>>>> though. >>>> Assume we are in an error state in vpci_process_pending *on one of the >>>> vCPUs* >>>> and we call vpci_remove_device. vpci_remove_device tries to acquire the >>>> lock and it can't just because there are some other vpci code is running >>>> on other vCPU. >>>> Then what do we do here? We are in SoftIRQ context now and we can't spin >>>> trying to acquire d->vpci_rwlock forever. Neither we can blindly free vpci >>>> structure because it is seen by all vCPUs and may crash them. >>>> >>>> If vpci_remove_device is in hypercall context it just returns -ERESTART and >>>> hypercall continuation helps here. But not in SoftIRQ context. >>> Maybe then you want to invoke this cleanup from RCU context (whether >>> vpci_remove_device() itself or a suitable clone there of is TBD)? (I >>> will admit though that I didn't check whether that would satisfy all >>> constraints.) >>> >>> Then again it also hasn't become clear to me why you use write_trylock() >>> there. The lock contention you describe doesn't, on the surface, look >>> any different from situations elsewhere. >> I use write_trylock in vpci_remove_device because if we can't >> acquire the lock then we defer device removal. This would work >> well if called from a hypercall which will employ hypercall continuation. >> But SoftIRQ getting -ERESTART is something that we can't probably >> handle by restarting as hypercall can, thus I only see that >> vpci_process_pending >> will need to spin and wait until vpci_remove_device succeeds. > Does anybody have any better solution for preventing SoftIRQ from > spinning on vpci_remove_device and -ERESTART? Well, at this point I can suggest only a marginal improvement: Instead of spinning inside the softirq handler, you want to re-raise the softirq and exit the handler. That way at least higher "priority" softirqs won't be starved. Beyond that - maybe the guest (or just a vcpu of it) needs pausing in such an event, with the work deferred to a tasklet? Yet I don't think my earlier question regarding the use of write_trylock() was really answered. What you said in reply doesn't explain (to me at least) why write_lock() is not an option. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |