[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal


  • To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 22 Nov 2021 15:57:38 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rmSqdYAikkKtngjChwEUZFUQ6S80RBVi06F/K+kD5mU=; b=bAtrtmYqr97ePD4V8yQkTmDtjHqdwSF8kJmNl6iSUDZ6FXSccaY9IUvXC6F1QKRpobhW2E9dA/uIXIWkijiJ3oKK5Ha8FMY+TQFmwBoxLSPXfimvyYcdMoS7d9y+7ACFcoHIMYjslBXoHi2/yRgIpPlpC7YgvbTtpNZdCe3qj+1ctfxi1wmAvAdvSxPfVh1eSu5foBdx70+CusNhVNwdfd4ItRsrZ3nsPRUCQVn7300sD61QJBF0PJvK4ZDdp8lWa4uRLnmdswoBRww3RS4TOjpK4cDN+L+B6GDouPKF9Q84r6h6UOR8CnOYB+uaFCKTvpDhBMWnuMEUnljFaB+4iQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NgtK0g+FSsUhe1SU2MvMA49daRRdytlI7680ihMrENCS4ysISWA6WQ00mJkNDvnYug23pRbY9oW/GHAPGdrrocxagAT8nC7y1ygd/dxdxxwFZCj1PMCReXiSzku7GKQW2Tzm6tXw+ZQBKY1y0jKwnUNL5Nc+stCrjswgILyRT1VTcvtr+7W7d4d6ToQ4n5TLMRO+dfuAh3Gsy3HN4L2JCepgoL8CyjVjAfCVelWnUmczHYRIRpcZcIUzepyWTOn0D0OFT7ZKKc5yQ32saMaRRHsV6tV6bxTafgIsH2s6oE1fQQvJ/nYTDkXhIPRElx5pdrSZrWUR4l2lsxF3afd5BQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>
  • Delivery-date: Mon, 22 Nov 2021 14:57:52 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22.11.2021 15:45, Oleksandr Andrushchenko wrote:
> 
> 
> On 22.11.21 16:37, Jan Beulich wrote:
>> On 22.11.2021 15:21, Oleksandr Andrushchenko wrote:
>>> On 19.11.21 15:34, Oleksandr Andrushchenko wrote:
>>>> On 19.11.21 15:25, Jan Beulich wrote:
>>>>> On 19.11.2021 14:16, Oleksandr Andrushchenko wrote:
>>>>>> On 19.11.21 15:00, Jan Beulich wrote:
>>>>>>> On 19.11.2021 13:34, Oleksandr Andrushchenko wrote:
>>>>>>>> Possible locking and other work needed:
>>>>>>>> =======================================
>>>>>>>>
>>>>>>>> 1. pcidevs_{lock|unlock} is too heavy and is per-host
>>>>>>>> 2. pdev->vpci->lock cannot be used as vpci is freed by 
>>>>>>>> vpci_remove_device
>>>>>>>> 3. We may want a dedicated per-domain rw lock to be implemented:
>>>>>>>>
>>>>>>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>>>>>>>> index 28146ee404e6..ebf071893b21 100644
>>>>>>>> --- a/xen/include/xen/sched.h
>>>>>>>> +++ b/xen/include/xen/sched.h
>>>>>>>> @@ -444,6 +444,7 @@ struct domain
>>>>>>>>
>>>>>>>>       #ifdef CONFIG_HAS_PCI
>>>>>>>>           struct list_head pdev_list;
>>>>>>>> +    rwlock_t vpci_rwlock;
>>>>>>>> +    bool vpci_terminating; <- atomic?
>>>>>>>>       #endif
>>>>>>>> then vpci_remove_device is a writer (cold path) and 
>>>>>>>> vpci_process_pending and
>>>>>>>> vpci_mmio_{read|write} are readers (hot path).
>>>>>>> Right - you need such a lock for other purposes anyway, as per the
>>>>>>> discussion with Julien.
>>>>>> What about bool vpci_terminating? Do you see it as an atomic type or 
>>>>>> just bool?
>>>>> Having seen only ...
>>>>>
>>>>>>>> do_physdev_op(PHYSDEVOP_pci_device_remove) will need 
>>>>>>>> hypercall_create_continuation
>>>>>>>> to be implemented, so when re-start removal if need be:
>>>>>>>>
>>>>>>>> vpci_remove_device()
>>>>>>>> {
>>>>>>>>        d->vpci_terminating = true;
>>>>> ... this use so far, I can't tell yet. But at a first glance a boolean
>>>>> looks to be what you need.
>>>>>
>>>>>>>>        remove vPCI register handlers <- this will cut off PCI_COMMAND 
>>>>>>>> emulation among others
>>>>>>>>        if ( !write_trylock(d->vpci_rwlock) )
>>>>>>>>          return -ERESTART;
>>>>>>>>        xfree(pdev->vpci);
>>>>>>>>        pdev->vpci = NULL;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Then this d->vpci_rwlock becomes a dedicated vpci per-domain lock for
>>>>>>>> other operations which may require it, e.g. virtual bus topology can
>>>>>>>> use it when assigning vSBDF etc.
>>>>>>>>
>>>>>>>> 4. vpci_remove_device needs to be removed from vpci_process_pending
>>>>>>>> and do nothing for Dom0 and crash DomU otherwise:
>>>>>>> Why is this? I'm not outright opposed, but I don't immediately see why
>>>>>>> trying to remove the problematic device wouldn't be a reasonable course
>>>>>>> of action anymore. vpci_remove_device() may need to become more careful
>>>>>>> as to not crashing,
>>>>>> vpci_remove_device does not crash, vpci_process_pending does
>>>>>>>      though.
>>>>>> Assume we are in an error state in vpci_process_pending *on one of the 
>>>>>> vCPUs*
>>>>>> and we call vpci_remove_device. vpci_remove_device tries to acquire the
>>>>>> lock and it can't just because there are some other vpci code is running 
>>>>>> on other vCPU.
>>>>>> Then what do we do here? We are in SoftIRQ context now and we can't spin
>>>>>> trying to acquire d->vpci_rwlock forever. Neither we can blindly free 
>>>>>> vpci
>>>>>> structure because it is seen by all vCPUs and may crash them.
>>>>>>
>>>>>> If vpci_remove_device is in hypercall context it just returns -ERESTART 
>>>>>> and
>>>>>> hypercall continuation helps here. But not in SoftIRQ context.
>>>>> Maybe then you want to invoke this cleanup from RCU context (whether
>>>>> vpci_remove_device() itself or a suitable clone there of is TBD)? (I
>>>>> will admit though that I didn't check whether that would satisfy all
>>>>> constraints.)
>>>>>
>>>>> Then again it also hasn't become clear to me why you use write_trylock()
>>>>> there. The lock contention you describe doesn't, on the surface, look
>>>>> any different from situations elsewhere.
>>>> I use write_trylock in vpci_remove_device because if we can't
>>>> acquire the lock then we defer device removal. This would work
>>>> well if called from a hypercall which will employ hypercall continuation.
>>>> But SoftIRQ getting -ERESTART is something that we can't probably
>>>> handle by restarting as hypercall can, thus I only see that 
>>>> vpci_process_pending
>>>> will need to spin and wait until vpci_remove_device succeeds.
>>> Does anybody have any better solution for preventing SoftIRQ from
>>> spinning on vpci_remove_device and -ERESTART?
>> Well, at this point I can suggest only a marginal improvement: Instead of
>> spinning inside the softirq handler, you want to re-raise the softirq and
>> exit the handler. That way at least higher "priority" softirqs won't be
>> starved.
>>
>> Beyond that - maybe the guest (or just a vcpu of it) needs pausing in such
>> an event, with the work deferred to a tasklet?
>>
>> Yet I don't think my earlier question regarding the use of write_trylock()
>> was really answered. What you said in reply doesn't explain (to me at
>> least) why write_lock() is not an option.
> I was thinking that we do not want to freeze in case we are calling 
> vpci_remove_device
> from SoftIRQ context, thus we try to lock and if we can't we return -ERESTART
> indicating that the removal needs to be deferred. If we use write_lock, then
> SoftIRQ -> write_lock will spin there waiting for readers to release the lock.
> 
> write_lock actually makes things a lot easier, but I just don't know if it
> is ok to use it. If so, then vpci_remove_device becomes synchronous and
> there is no need in hypercall continuation and other heavy machinery for
> re-scheduling SoftIRQ...

I'm inclined to ask: If it wasn't okay to use here, then where would it be
okay to use? Of course I realize there are cases when long spinning times
can be a problem. But I expect there aren't going to be excessively long
lock holding regions for this lock, and I also would expect average
contention to not be overly bad. But in the end you know better the code
that you're writing (and which may lead to issues with the lock usage) than
I do ...

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.