[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal


  • To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 22 Nov 2021 15:37:09 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=R/ElNERL9/eNR4JIdUzQrfBw4It/aqNqcg11BYPz5JI=; b=Z+uSu9s7qUp0cRTpaNOLCcyGAz+Ypzh545KezpNvhal8f7UZJ7PYXpTFYYnJfWxW4M7Ob02DbJ4aMVOBLUt4qmZjmBBYlUfE+PdmuVX+SjUVD5fMaR9fT987WsQaYWE6346bk3EvWUSFzIUtMNsXswlu6oAQ5+ZQ5bmA1BT2GWr/ib3cPpbrpJTJjKEroHekBk5ySxE3K24zduj++EKGI1sr3gWM1pniwuXkIF/XIBFiawQvODC3wmaK83pbDbdsqfDi/Q7lleIwsPzna5mnbwmIyYGM7+ADiFVGPX5TZpRX7nuocJQhz1a5exYQ2IqHAg6XV0WHjY0jZgnYq2cW5A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Skq8LwG2xzrB/fHOvQsqrHaYlPQP1yfuxUA+HnNXPvaR8yTmR+HoJhXqJ1Gw0iAUnJw7JpudFge8eCeVvxRThhf/TpFGAyIfcB2MMAJZdw/BwyjygC2m/xQCV2n3ALVco4bxQgiHFfrhKCbYHKTtA75cy1tMdJeLiU/ZVYZE02AiF7C8PelLYLQ3R+xXkahbaRvHmCIkB/0PaKCj6pujVY5ywr2ILfCYXxUUGz865aC5IysXrYQZZrAF1a5dkVJ2o6e2JZEOgobCSXvrYUQSmDNZQU7CUxBX7pSI0L0+Ya1vgQZk53tl4zHP+Hzgp0rNvB/+PtwM5RSw1egcqr6gpA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>
  • Delivery-date: Mon, 22 Nov 2021 14:37:32 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22.11.2021 15:21, Oleksandr Andrushchenko wrote:
> On 19.11.21 15:34, Oleksandr Andrushchenko wrote:
>> On 19.11.21 15:25, Jan Beulich wrote:
>>> On 19.11.2021 14:16, Oleksandr Andrushchenko wrote:
>>>> On 19.11.21 15:00, Jan Beulich wrote:
>>>>> On 19.11.2021 13:34, Oleksandr Andrushchenko wrote:
>>>>>> Possible locking and other work needed:
>>>>>> =======================================
>>>>>>
>>>>>> 1. pcidevs_{lock|unlock} is too heavy and is per-host
>>>>>> 2. pdev->vpci->lock cannot be used as vpci is freed by vpci_remove_device
>>>>>> 3. We may want a dedicated per-domain rw lock to be implemented:
>>>>>>
>>>>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>>>>>> index 28146ee404e6..ebf071893b21 100644
>>>>>> --- a/xen/include/xen/sched.h
>>>>>> +++ b/xen/include/xen/sched.h
>>>>>> @@ -444,6 +444,7 @@ struct domain
>>>>>>
>>>>>>      #ifdef CONFIG_HAS_PCI
>>>>>>          struct list_head pdev_list;
>>>>>> +    rwlock_t vpci_rwlock;
>>>>>> +    bool vpci_terminating; <- atomic?
>>>>>>      #endif
>>>>>> then vpci_remove_device is a writer (cold path) and vpci_process_pending 
>>>>>> and
>>>>>> vpci_mmio_{read|write} are readers (hot path).
>>>>> Right - you need such a lock for other purposes anyway, as per the
>>>>> discussion with Julien.
>>>> What about bool vpci_terminating? Do you see it as an atomic type or just 
>>>> bool?
>>> Having seen only ...
>>>
>>>>>> do_physdev_op(PHYSDEVOP_pci_device_remove) will need 
>>>>>> hypercall_create_continuation
>>>>>> to be implemented, so when re-start removal if need be:
>>>>>>
>>>>>> vpci_remove_device()
>>>>>> {
>>>>>>       d->vpci_terminating = true;
>>> ... this use so far, I can't tell yet. But at a first glance a boolean
>>> looks to be what you need.
>>>
>>>>>>       remove vPCI register handlers <- this will cut off PCI_COMMAND 
>>>>>> emulation among others
>>>>>>       if ( !write_trylock(d->vpci_rwlock) )
>>>>>>         return -ERESTART;
>>>>>>       xfree(pdev->vpci);
>>>>>>       pdev->vpci = NULL;
>>>>>> }
>>>>>>
>>>>>> Then this d->vpci_rwlock becomes a dedicated vpci per-domain lock for
>>>>>> other operations which may require it, e.g. virtual bus topology can
>>>>>> use it when assigning vSBDF etc.
>>>>>>
>>>>>> 4. vpci_remove_device needs to be removed from vpci_process_pending
>>>>>> and do nothing for Dom0 and crash DomU otherwise:
>>>>> Why is this? I'm not outright opposed, but I don't immediately see why
>>>>> trying to remove the problematic device wouldn't be a reasonable course
>>>>> of action anymore. vpci_remove_device() may need to become more careful
>>>>> as to not crashing,
>>>> vpci_remove_device does not crash, vpci_process_pending does
>>>>>     though.
>>>> Assume we are in an error state in vpci_process_pending *on one of the 
>>>> vCPUs*
>>>> and we call vpci_remove_device. vpci_remove_device tries to acquire the
>>>> lock and it can't just because there are some other vpci code is running 
>>>> on other vCPU.
>>>> Then what do we do here? We are in SoftIRQ context now and we can't spin
>>>> trying to acquire d->vpci_rwlock forever. Neither we can blindly free vpci
>>>> structure because it is seen by all vCPUs and may crash them.
>>>>
>>>> If vpci_remove_device is in hypercall context it just returns -ERESTART and
>>>> hypercall continuation helps here. But not in SoftIRQ context.
>>> Maybe then you want to invoke this cleanup from RCU context (whether
>>> vpci_remove_device() itself or a suitable clone there of is TBD)? (I
>>> will admit though that I didn't check whether that would satisfy all
>>> constraints.)
>>>
>>> Then again it also hasn't become clear to me why you use write_trylock()
>>> there. The lock contention you describe doesn't, on the surface, look
>>> any different from situations elsewhere.
>> I use write_trylock in vpci_remove_device because if we can't
>> acquire the lock then we defer device removal. This would work
>> well if called from a hypercall which will employ hypercall continuation.
>> But SoftIRQ getting -ERESTART is something that we can't probably
>> handle by restarting as hypercall can, thus I only see that 
>> vpci_process_pending
>> will need to spin and wait until vpci_remove_device succeeds.
> Does anybody have any better solution for preventing SoftIRQ from
> spinning on vpci_remove_device and -ERESTART?

Well, at this point I can suggest only a marginal improvement: Instead of
spinning inside the softirq handler, you want to re-raise the softirq and
exit the handler. That way at least higher "priority" softirqs won't be
starved.

Beyond that - maybe the guest (or just a vcpu of it) needs pausing in such
an event, with the work deferred to a tasklet?

Yet I don't think my earlier question regarding the use of write_trylock()
was really answered. What you said in reply doesn't explain (to me at
least) why write_lock() is not an option.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.