Xen project Mailing List

Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal

From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>

Date: Mon, 22 Nov 2021 15:02:15 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=bLVQnj3fVNlM3UfyJ1YZK1djM3gv8izuT/8SODHseKs=; b=ogNn11kb3Mg6pqUU+aLSkLHl6yHG7tHmeKzIiMAvEoD6laERacsM2dXIVdqkPYIMdAmeJQEe1SIyr/mfwcX+0x/noyjTgIfgIIQLau+wBUCdbARZLRvMRyETDQiCwAo8AT4xVYJon/xUUX+yOx1w6aU7PVdcMcXzwG0GSxldkrBa9C5gZeq4v/nmQChSMryFvRygHN64CNi+9vICVeuO2QPW6fSnceVKjPQR7Fw0nAdSISyU8psNAULvgnYGZ1GDirShV4IpG1RxTRHXdZRQBcIHpTDf2KOgY70TkYI5CXL2IUdBYtaYHUavOkUqi8UUFnjBewbMMC80OeMtTz1xvg==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WvP61Spoj1l4EwaQ93ZtDEmbb72Wpq9B3TGOz5PDkwhXNLe1Fm6f+ebYsrbw7XHOHVrD/D+6BXenGRcFDlzZuOMQEzbCrLS0sZR4lDPk6HRqV/qr5tzfkhGgvk5X3f3R39fzOb3PIw3GaXvpZ81/mzmUwkcr81yRHx8GXMjhFfTaCIrjqAZw4y4peKnNR1fY+Ip1Yad72/M/gLHJNWOYkI4iESS8MM3ElLTweKHJ7fEkMgwzexOy56Y6TNzk2Id3H8z4pKEL63ogsq+tJOsAYCmcdG2fMkIYiTiSIEKthP1LoIOB9WMPqJD8PDGjL4J6hKcnY6RSJLS/dUnwspmCeA==

Cc: Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>

Delivery-date: Mon, 22 Nov 2021 15:02:39 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHX0hJIYALl/D9fL0OD6N0XGDJh2awHdhMAgAGHTYCAAA1EAIAABOkAgAAF6YCAAATnAIAAQRQAgAAGPgCAAAR1gIAAAvIAgAAFxACAAAoGAIAAAXMAgAABgYCAAAVtAIAAAX+AgAAB1wCAAVryAIAAB06AgAAEPgCAAAKhAIAAApkAgATD+4CAAARtgIAAAk+AgAADagCAAAFJAA==

Thread-topic: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal

On 22.11.21 16:57, Jan Beulich wrote: > On 22.11.2021 15:45, Oleksandr Andrushchenko wrote: >> >> On 22.11.21 16:37, Jan Beulich wrote: >>> On 22.11.2021 15:21, Oleksandr Andrushchenko wrote: >>>> On 19.11.21 15:34, Oleksandr Andrushchenko wrote: >>>>> On 19.11.21 15:25, Jan Beulich wrote: >>>>>> On 19.11.2021 14:16, Oleksandr Andrushchenko wrote: >>>>>>> On 19.11.21 15:00, Jan Beulich wrote: >>>>>>>> On 19.11.2021 13:34, Oleksandr Andrushchenko wrote: >>>>>>>>> Possible locking and other work needed: >>>>>>>>> ======================================= >>>>>>>>> >>>>>>>>> 1. pcidevs_{lock|unlock} is too heavy and is per-host >>>>>>>>> 2. pdev->vpci->lock cannot be used as vpci is freed by >>>>>>>>> vpci_remove_device >>>>>>>>> 3. We may want a dedicated per-domain rw lock to be implemented: >>>>>>>>> >>>>>>>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h >>>>>>>>> index 28146ee404e6..ebf071893b21 100644 >>>>>>>>> --- a/xen/include/xen/sched.h >>>>>>>>> +++ b/xen/include/xen/sched.h >>>>>>>>> @@ -444,6 +444,7 @@ struct domain >>>>>>>>> >>>>>>>>> #ifdef CONFIG_HAS_PCI >>>>>>>>> struct list_head pdev_list; >>>>>>>>> + rwlock_t vpci_rwlock; >>>>>>>>> + bool vpci_terminating; <- atomic? >>>>>>>>> #endif >>>>>>>>> then vpci_remove_device is a writer (cold path) and >>>>>>>>> vpci_process_pending and >>>>>>>>> vpci_mmio_{read|write} are readers (hot path). >>>>>>>> Right - you need such a lock for other purposes anyway, as per the >>>>>>>> discussion with Julien. >>>>>>> What about bool vpci_terminating? Do you see it as an atomic type or >>>>>>> just bool? >>>>>> Having seen only ... >>>>>> >>>>>>>>> do_physdev_op(PHYSDEVOP_pci_device_remove) will need >>>>>>>>> hypercall_create_continuation >>>>>>>>> to be implemented, so when re-start removal if need be: >>>>>>>>> >>>>>>>>> vpci_remove_device() >>>>>>>>> { >>>>>>>>> d->vpci_terminating = true; >>>>>> ... this use so far, I can't tell yet. But at a first glance a boolean >>>>>> looks to be what you need. >>>>>> >>>>>>>>> remove vPCI register handlers <- this will cut off >>>>>>>>> PCI_COMMAND emulation among others >>>>>>>>> if ( !write_trylock(d->vpci_rwlock) ) >>>>>>>>> return -ERESTART; >>>>>>>>> xfree(pdev->vpci); >>>>>>>>> pdev->vpci = NULL; >>>>>>>>> } >>>>>>>>> >>>>>>>>> Then this d->vpci_rwlock becomes a dedicated vpci per-domain lock for >>>>>>>>> other operations which may require it, e.g. virtual bus topology can >>>>>>>>> use it when assigning vSBDF etc. >>>>>>>>> >>>>>>>>> 4. vpci_remove_device needs to be removed from vpci_process_pending >>>>>>>>> and do nothing for Dom0 and crash DomU otherwise: >>>>>>>> Why is this? I'm not outright opposed, but I don't immediately see why >>>>>>>> trying to remove the problematic device wouldn't be a reasonable course >>>>>>>> of action anymore. vpci_remove_device() may need to become more careful >>>>>>>> as to not crashing, >>>>>>> vpci_remove_device does not crash, vpci_process_pending does >>>>>>>> though. >>>>>>> Assume we are in an error state in vpci_process_pending *on one of the >>>>>>> vCPUs* >>>>>>> and we call vpci_remove_device. vpci_remove_device tries to acquire the >>>>>>> lock and it can't just because there are some other vpci code is >>>>>>> running on other vCPU. >>>>>>> Then what do we do here? We are in SoftIRQ context now and we can't spin >>>>>>> trying to acquire d->vpci_rwlock forever. Neither we can blindly free >>>>>>> vpci >>>>>>> structure because it is seen by all vCPUs and may crash them. >>>>>>> >>>>>>> If vpci_remove_device is in hypercall context it just returns -ERESTART >>>>>>> and >>>>>>> hypercall continuation helps here. But not in SoftIRQ context. >>>>>> Maybe then you want to invoke this cleanup from RCU context (whether >>>>>> vpci_remove_device() itself or a suitable clone there of is TBD)? (I >>>>>> will admit though that I didn't check whether that would satisfy all >>>>>> constraints.) >>>>>> >>>>>> Then again it also hasn't become clear to me why you use write_trylock() >>>>>> there. The lock contention you describe doesn't, on the surface, look >>>>>> any different from situations elsewhere. >>>>> I use write_trylock in vpci_remove_device because if we can't >>>>> acquire the lock then we defer device removal. This would work >>>>> well if called from a hypercall which will employ hypercall continuation. >>>>> But SoftIRQ getting -ERESTART is something that we can't probably >>>>> handle by restarting as hypercall can, thus I only see that >>>>> vpci_process_pending >>>>> will need to spin and wait until vpci_remove_device succeeds. >>>> Does anybody have any better solution for preventing SoftIRQ from >>>> spinning on vpci_remove_device and -ERESTART? >>> Well, at this point I can suggest only a marginal improvement: Instead of >>> spinning inside the softirq handler, you want to re-raise the softirq and >>> exit the handler. That way at least higher "priority" softirqs won't be >>> starved. >>> >>> Beyond that - maybe the guest (or just a vcpu of it) needs pausing in such >>> an event, with the work deferred to a tasklet? >>> >>> Yet I don't think my earlier question regarding the use of write_trylock() >>> was really answered. What you said in reply doesn't explain (to me at >>> least) why write_lock() is not an option. >> I was thinking that we do not want to freeze in case we are calling >> vpci_remove_device >> from SoftIRQ context, thus we try to lock and if we can't we return -ERESTART >> indicating that the removal needs to be deferred. If we use write_lock, then >> SoftIRQ -> write_lock will spin there waiting for readers to release the >> lock. >> >> write_lock actually makes things a lot easier, but I just don't know if it >> is ok to use it. If so, then vpci_remove_device becomes synchronous and >> there is no need in hypercall continuation and other heavy machinery for >> re-scheduling SoftIRQ... > I'm inclined to ask: If it wasn't okay to use here, then where would it be > okay to use? Of course I realize there are cases when long spinning times > can be a problem. I can't prove, but I have a feeling that write_lock could be less "harmful" in case of a hypercall rather than SoftIRQ > But I expect there aren't going to be excessively long > lock holding regions for this lock, and I also would expect average > contention to not be overly bad. Yes, this is my impression as well > But in the end you know better the code > that you're writing (and which may lead to issues with the lock usage) than > I do ... I am pretty much ok with write_lock as it does make things way easier. So I'll got with write_lock then and if we spot great contention then we can improve that > > Jan > Thank you!! Oleksandr

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.