Xen project Mailing List

Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal

From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>

Date: Tue, 16 Nov 2021 08:23:22 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=VNvNXv2yfdVGZfhl+yQfUPdOI5uAQYgfoNLray0+hXo=; b=AU5061anWKrfU9Xfq20A+4QESeeowq2cT0mvi6g/HPvbxmvO0SYZeOAQjU4+i3xTUTnX5//DGEkutxPglxza19v1ZbkfdfYRTafUy+bguYjma10KxgjM/Iw4Xd/UAcktUOAiOuqbapXuo3CJuI2j9QkT3v5eafIdpdPXDmcU5Vf/OHXiyohmeLUfZrUrAZKB2LvH5iMc9nRWQvouPSAt9Ds2q2WXwgsPsXQxLHKVNk3vXEo6uBsTMQo5GebRQ/CpF2h9PDFu2ID+KsSAbnXbTU+mpshjX4GXpyYISO3ubSdS0MM9C6wS7mZ+8qFOtMiiqYg2AOFY/t6mxlNmhIdpfw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fYGLGognEZmWxFE2Dx02XyPMUNZu0YS+s6qVq8snGm6cXOLm/klFjEE/3PteioZNQK9PdjOud0xx+BrVIquesdTTSI7+x+C087vxPSuq49Gzn3z8Wr5KRyk3K9eWDAQJliio76PTl1HTcRqEf/L9LtNgBfp8s3Mn2TjgIVzONENoRcNZ+r3Xi06s8671wy/pUV4uGNsj5BQVTH/GlLQT4wAoUlV7J7Z3JOvtjy8G3ZK35PxGIGqgS0bvNKJZv9wcdYnMFjX9OEtr2RzmL6fXTo5dqVWYxNSAxOHaNPyjAwnCFDh0jKC7lgaYCg0IfHvfhT/s3cdiUuJYIGlrvEa3Lw==

Cc: "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "roger.pau@xxxxxxxxxx" <roger.pau@xxxxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>

Delivery-date: Tue, 16 Nov 2021 08:23:56 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHX0hJIYALl/D9fL0OD6N0XGDJh2awE30EAgAD09oCAAAgGAIAABgyA

Thread-topic: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal

On 16.11.21 10:01, Jan Beulich wrote: > On 16.11.2021 08:32, Oleksandr Andrushchenko wrote: >> On 15.11.21 18:56, Jan Beulich wrote: >>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: >>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >>>> >>>> When a vPCI is removed for a PCI device it is possible that we have >>>> scheduled a delayed work for map/unmap operations for that device. >>>> For example, the following scenario can illustrate the problem: >>>> >>>> pci_physdev_op >>>> pci_add_device >>>> init_bars -> modify_bars -> defer_map -> >>>> raise_softirq(SCHEDULE_SOFTIRQ) >>>> iommu_add_device <- FAILS >>>> vpci_remove_device -> xfree(pdev->vpci) >>>> >>>> leave_hypervisor_to_guest >>>> vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == NULL >>>> >>>> For the hardware domain we continue execution as the worse that >>>> could happen is that MMIO mappings are left in place when the >>>> device has been deassigned >>> Is continuing safe in this case? I.e. isn't there the risk of a NULL >>> deref? >> I think it is safe to continue > And why do you think so? I.e. why is there no race for Dom0 when there > is one for DomU? Well, then we need to use a lock to synchronize the two. I guess this needs to be pci devs lock unfortunately > >>>> For unprivileged domains that get a failure in the middle of a vPCI >>>> {un}map operation we need to destroy them, as we don't know in which >>>> state the p2m is. This can only happen in vpci_process_pending for >>>> DomUs as they won't be allowed to call pci_add_device. >>> You saying "we need to destroy them" made me look for a new domain_crash() >>> that you add, but there is none. What is this about? >> Yes, I guess we need to implicitly destroy the domain, > What do you mean by "implicitly"? @@ -151,14 +151,18 @@ bool vpci_process_pending(struct vcpu *v) vpci_cancel_pending(v->vpci.pdev); if ( rc ) + { /* * FIXME: in case of failure remove the device from the domain. * Note that there might still be leftover mappings. While this is + * safe for Dom0, for DomUs the domain needs to be killed in order + * to avoid leaking stale p2m mappings on failure. */ vpci_remove_device(v->vpci.pdev); + + if ( !is_hardware_domain(v->domain) ) + domain_crash(v->domain); > >>>> @@ -165,6 +164,18 @@ bool vpci_process_pending(struct vcpu *v) >>>> return false; >>>> } >>>> >>>> +void vpci_cancel_pending(const struct pci_dev *pdev) >>>> +{ >>>> + struct vcpu *v = current; >>>> + >>>> + /* Cancel any pending work now. */ >>> Doesn't "any" include pending work on all vCPU-s of the guest, not >>> just current? Is current even relevant (as in: part of the correct >>> domain), considering ... >>> >>>> --- a/xen/drivers/vpci/vpci.c >>>> +++ b/xen/drivers/vpci/vpci.c >>>> @@ -51,6 +51,8 @@ void vpci_remove_device(struct pci_dev *pdev) >>>> xfree(r); >>>> } >>>> spin_unlock(&pdev->vpci->lock); >>>> + >>>> + vpci_cancel_pending(pdev); >>> ... this code path, when coming here from pci_{add,remove}_device()? >>> >>> I can agree that there's a problem here, but I think you need to >>> properly (i.e. in a race free manner) drain pending work. >> Yes, the code is inconsistent with this respect. I am thinking about: >> >> void vpci_cancel_pending(const struct pci_dev *pdev) >> { >> struct domain *d = pdev->domain; >> struct vcpu *v; >> >> /* Cancel any pending work now. */ >> domain_lock(d); >> for_each_vcpu ( d, v ) >> { >> vcpu_pause(v); >> if ( v->vpci.mem && v->vpci.pdev == pdev) > Nit: Same style issue as in the original patch. Will fix > >> { >> rangeset_destroy(v->vpci.mem); >> v->vpci.mem = NULL; >> } >> vcpu_unpause(v); >> } >> domain_unlock(d); >> } >> >> which seems to solve all the concerns. Is this what you mean? > Something along these lines. I expect you will want to make use of > domain_pause_except_self(), Yes, this is what is needed here, thanks. The only question is that int domain_pause_except_self(struct domain *d) { [snip] /* Avoid racing with other vcpus which may want to be pausing us */ if ( !spin_trylock(&d->hypercall_deadlock_mutex) ) return -ERESTART; so it is not clear what do we do in case of -ERESTART: do we want to spin? Otherwise we will leave the job not done effectively not canceling the pending work. Any idea other then spinning? > and I don't understand the purpose of > acquiring the domain lock. You are right, no need > > Jan > Thank you, Oleksandr

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.