[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Mon, 22 Nov 2021 14:45:26 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UhmmMzCHO1lGObZlagMptMusJBD+3LWdpoeDX28BPlg=; b=Vpmg3AZVbOAgM6FY5OWqI+wzkMeRVglnZB6I+YGtVUAyAvlC0MavTES0wDaRCjpGdnKa8K2W16TmNi2PNhYud6i/MejYwfhueXE0YGvx+Ud2EsR8bWTKjtxnp7Yp32QGjqviqfbkuzvRpGW2ukSrgDzZDt7ec44N5q8TQAaax7WjC93IUXhFW/n9CzzhRfXTKTaeFeJKu0MZtR3FQA9C3uggh8BNMoPMqXc6NmRNxTiG+dpyAG0JxyrSAiHmCIyVYJWlci/rL8i8P2R6jYQ7x1uoKjWoenlAIjeESkEb1655CR/BuwzKOIjPou4bSPc7mfmEwwyhVUD92WxF6dc1Zg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CJdr4inmVe9z4p2rv/33Xq8+WVlCMnZEi3hK9M3I6R2xhugjJUwHHG0V94JDJsjA7pP3MeKMgwVvlGNVsvPXX8TYCAITBKYs1iR5rZbZEUFX6W3eLhLOLM1hjhPQqkoGt0CnRk9ZRmQl6rnz8yUkoRNc+LE8pe0HU1QXnPESLN/KyTLasLFyYbKI6WDE81z0+jwjClhN+mXT7rPcVsD7pyE7HlZN+PFKnN+mYqxzQZqJQxfTEpvkd9gVdwQMtOsavaD3RdQY8Bs5jEo5Yvhn5SkpnqrSP8hPR+ZhpNOxGk04JhLWHAbUq/+lppe9YAP1tBdFDqVGulYylcY5Hklcqw==
  • Cc: Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Delivery-date: Mon, 22 Nov 2021 14:45:47 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHX0hJIYALl/D9fL0OD6N0XGDJh2awHdhMAgAGHTYCAAA1EAIAABOkAgAAF6YCAAATnAIAAQRQAgAAGPgCAAAR1gIAAAvIAgAAFxACAAAoGAIAAAXMAgAABgYCAAAVtAIAAAX+AgAAB1wCAAVryAIAAB06AgAAEPgCAAAKhAIAAApkAgATD+4CAAARtgIAAAk+A
  • Thread-topic: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal


On 22.11.21 16:37, Jan Beulich wrote:
> On 22.11.2021 15:21, Oleksandr Andrushchenko wrote:
>> On 19.11.21 15:34, Oleksandr Andrushchenko wrote:
>>> On 19.11.21 15:25, Jan Beulich wrote:
>>>> On 19.11.2021 14:16, Oleksandr Andrushchenko wrote:
>>>>> On 19.11.21 15:00, Jan Beulich wrote:
>>>>>> On 19.11.2021 13:34, Oleksandr Andrushchenko wrote:
>>>>>>> Possible locking and other work needed:
>>>>>>> =======================================
>>>>>>>
>>>>>>> 1. pcidevs_{lock|unlock} is too heavy and is per-host
>>>>>>> 2. pdev->vpci->lock cannot be used as vpci is freed by 
>>>>>>> vpci_remove_device
>>>>>>> 3. We may want a dedicated per-domain rw lock to be implemented:
>>>>>>>
>>>>>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>>>>>>> index 28146ee404e6..ebf071893b21 100644
>>>>>>> --- a/xen/include/xen/sched.h
>>>>>>> +++ b/xen/include/xen/sched.h
>>>>>>> @@ -444,6 +444,7 @@ struct domain
>>>>>>>
>>>>>>>       #ifdef CONFIG_HAS_PCI
>>>>>>>           struct list_head pdev_list;
>>>>>>> +    rwlock_t vpci_rwlock;
>>>>>>> +    bool vpci_terminating; <- atomic?
>>>>>>>       #endif
>>>>>>> then vpci_remove_device is a writer (cold path) and 
>>>>>>> vpci_process_pending and
>>>>>>> vpci_mmio_{read|write} are readers (hot path).
>>>>>> Right - you need such a lock for other purposes anyway, as per the
>>>>>> discussion with Julien.
>>>>> What about bool vpci_terminating? Do you see it as an atomic type or just 
>>>>> bool?
>>>> Having seen only ...
>>>>
>>>>>>> do_physdev_op(PHYSDEVOP_pci_device_remove) will need 
>>>>>>> hypercall_create_continuation
>>>>>>> to be implemented, so when re-start removal if need be:
>>>>>>>
>>>>>>> vpci_remove_device()
>>>>>>> {
>>>>>>>        d->vpci_terminating = true;
>>>> ... this use so far, I can't tell yet. But at a first glance a boolean
>>>> looks to be what you need.
>>>>
>>>>>>>        remove vPCI register handlers <- this will cut off PCI_COMMAND 
>>>>>>> emulation among others
>>>>>>>        if ( !write_trylock(d->vpci_rwlock) )
>>>>>>>          return -ERESTART;
>>>>>>>        xfree(pdev->vpci);
>>>>>>>        pdev->vpci = NULL;
>>>>>>> }
>>>>>>>
>>>>>>> Then this d->vpci_rwlock becomes a dedicated vpci per-domain lock for
>>>>>>> other operations which may require it, e.g. virtual bus topology can
>>>>>>> use it when assigning vSBDF etc.
>>>>>>>
>>>>>>> 4. vpci_remove_device needs to be removed from vpci_process_pending
>>>>>>> and do nothing for Dom0 and crash DomU otherwise:
>>>>>> Why is this? I'm not outright opposed, but I don't immediately see why
>>>>>> trying to remove the problematic device wouldn't be a reasonable course
>>>>>> of action anymore. vpci_remove_device() may need to become more careful
>>>>>> as to not crashing,
>>>>> vpci_remove_device does not crash, vpci_process_pending does
>>>>>>      though.
>>>>> Assume we are in an error state in vpci_process_pending *on one of the 
>>>>> vCPUs*
>>>>> and we call vpci_remove_device. vpci_remove_device tries to acquire the
>>>>> lock and it can't just because there are some other vpci code is running 
>>>>> on other vCPU.
>>>>> Then what do we do here? We are in SoftIRQ context now and we can't spin
>>>>> trying to acquire d->vpci_rwlock forever. Neither we can blindly free vpci
>>>>> structure because it is seen by all vCPUs and may crash them.
>>>>>
>>>>> If vpci_remove_device is in hypercall context it just returns -ERESTART 
>>>>> and
>>>>> hypercall continuation helps here. But not in SoftIRQ context.
>>>> Maybe then you want to invoke this cleanup from RCU context (whether
>>>> vpci_remove_device() itself or a suitable clone there of is TBD)? (I
>>>> will admit though that I didn't check whether that would satisfy all
>>>> constraints.)
>>>>
>>>> Then again it also hasn't become clear to me why you use write_trylock()
>>>> there. The lock contention you describe doesn't, on the surface, look
>>>> any different from situations elsewhere.
>>> I use write_trylock in vpci_remove_device because if we can't
>>> acquire the lock then we defer device removal. This would work
>>> well if called from a hypercall which will employ hypercall continuation.
>>> But SoftIRQ getting -ERESTART is something that we can't probably
>>> handle by restarting as hypercall can, thus I only see that 
>>> vpci_process_pending
>>> will need to spin and wait until vpci_remove_device succeeds.
>> Does anybody have any better solution for preventing SoftIRQ from
>> spinning on vpci_remove_device and -ERESTART?
> Well, at this point I can suggest only a marginal improvement: Instead of
> spinning inside the softirq handler, you want to re-raise the softirq and
> exit the handler. That way at least higher "priority" softirqs won't be
> starved.
>
> Beyond that - maybe the guest (or just a vcpu of it) needs pausing in such
> an event, with the work deferred to a tasklet?
>
> Yet I don't think my earlier question regarding the use of write_trylock()
> was really answered. What you said in reply doesn't explain (to me at
> least) why write_lock() is not an option.
I was thinking that we do not want to freeze in case we are calling 
vpci_remove_device
from SoftIRQ context, thus we try to lock and if we can't we return -ERESTART
indicating that the removal needs to be deferred. If we use write_lock, then
SoftIRQ -> write_lock will spin there waiting for readers to release the lock.

write_lock actually makes things a lot easier, but I just don't know if it
is ok to use it. If so, then vpci_remove_device becomes synchronous and
there is no need in hypercall continuation and other heavy machinery for
re-scheduling SoftIRQ...
>
> Jan
>
Thank you,
Oleksandr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.