[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Mon, 22 Nov 2021 14:21:20 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+2zMdDTQEv0Kb5kQfc4rZktekv3DCmMIAh+w8ywkJmM=; b=dXhX3reD3KmyWtjLjNJFSjJIbA35V0wElhPqzg5ASkoEnxrz1vIK/6HwbwYxZOkBosIOhFHnZJRbalfzN9YkYNDyKToi1EV4MqXftLwl+Uarh/UGTsthu6TmbOKUL6WnFG83xn+8GDWIdjlMTd9WS7vdF7nazRjky5U0fZq7t+PG5eNgBtyDzFuCndUf/iJfu35nqnyPsSztzPSh2cVD0vYAVp2JKcwHRTSXWDcV7ZKbu/9x7bU8hbUdFl4nXZnzOUnbin6ayDndmzVYMM2nBQJcHmwEcadZQfg4cmdETsRELcwAFQxLLq/iCIl3SeZwgFjHkBTAvEPTNqajcrqX7A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mapywL4+N23b7V2hssKXcSKxUtPc7ozIjMP1idQ67i8duCa7J7dmD4o23ukIH+yj9Chm0PtoQiYH89Fjym+iyKxwcucpjKR8zVH10DlvNuiZQKx/Ytbns4ZI5cKaYb22nNYPnHAY6ToNXLvhuh3PkyiS7f1NZipsh11WVlV0MAAiA9wI/NT84KbzpbppGwzm4WViiRCOzhIwpaciDocH9glvopTQ1gvSObJ1AlUBNopoThv+5oFItpuCcIDKE2bEIuqrrpUdDEZSBUj5ZhpYuzAAuczdELyGZ0S1+WqrRhlxA/MfzbWEjL2CIb+Sa+pvk02NREVJNN0RY8cQhea7yg==
  • Cc: Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Delivery-date: Mon, 22 Nov 2021 14:21:39 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHX0hJIYALl/D9fL0OD6N0XGDJh2awHdhMAgAGHTYCAAA1EAIAABOkAgAAF6YCAAATnAIAAQRQAgAAGPgCAAAR1gIAAAvIAgAAFxACAAAoGAIAAAXMAgAABgYCAAAVtAIAAAX+AgAAB1wCAAVryAIAAB06AgAAEPgCAAAKhAIAAApkAgATD+4A=
  • Thread-topic: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal


On 19.11.21 15:34, Oleksandr Andrushchenko wrote:
>
> On 19.11.21 15:25, Jan Beulich wrote:
>> On 19.11.2021 14:16, Oleksandr Andrushchenko wrote:
>>> On 19.11.21 15:00, Jan Beulich wrote:
>>>> On 19.11.2021 13:34, Oleksandr Andrushchenko wrote:
>>>>> Possible locking and other work needed:
>>>>> =======================================
>>>>>
>>>>> 1. pcidevs_{lock|unlock} is too heavy and is per-host
>>>>> 2. pdev->vpci->lock cannot be used as vpci is freed by vpci_remove_device
>>>>> 3. We may want a dedicated per-domain rw lock to be implemented:
>>>>>
>>>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>>>>> index 28146ee404e6..ebf071893b21 100644
>>>>> --- a/xen/include/xen/sched.h
>>>>> +++ b/xen/include/xen/sched.h
>>>>> @@ -444,6 +444,7 @@ struct domain
>>>>>
>>>>>      #ifdef CONFIG_HAS_PCI
>>>>>          struct list_head pdev_list;
>>>>> +    rwlock_t vpci_rwlock;
>>>>> +    bool vpci_terminating; <- atomic?
>>>>>      #endif
>>>>> then vpci_remove_device is a writer (cold path) and vpci_process_pending 
>>>>> and
>>>>> vpci_mmio_{read|write} are readers (hot path).
>>>> Right - you need such a lock for other purposes anyway, as per the
>>>> discussion with Julien.
>>> What about bool vpci_terminating? Do you see it as an atomic type or just 
>>> bool?
>> Having seen only ...
>>
>>>>> do_physdev_op(PHYSDEVOP_pci_device_remove) will need 
>>>>> hypercall_create_continuation
>>>>> to be implemented, so when re-start removal if need be:
>>>>>
>>>>> vpci_remove_device()
>>>>> {
>>>>>       d->vpci_terminating = true;
>> ... this use so far, I can't tell yet. But at a first glance a boolean
>> looks to be what you need.
>>
>>>>>       remove vPCI register handlers <- this will cut off PCI_COMMAND 
>>>>> emulation among others
>>>>>       if ( !write_trylock(d->vpci_rwlock) )
>>>>>         return -ERESTART;
>>>>>       xfree(pdev->vpci);
>>>>>       pdev->vpci = NULL;
>>>>> }
>>>>>
>>>>> Then this d->vpci_rwlock becomes a dedicated vpci per-domain lock for
>>>>> other operations which may require it, e.g. virtual bus topology can
>>>>> use it when assigning vSBDF etc.
>>>>>
>>>>> 4. vpci_remove_device needs to be removed from vpci_process_pending
>>>>> and do nothing for Dom0 and crash DomU otherwise:
>>>> Why is this? I'm not outright opposed, but I don't immediately see why
>>>> trying to remove the problematic device wouldn't be a reasonable course
>>>> of action anymore. vpci_remove_device() may need to become more careful
>>>> as to not crashing,
>>> vpci_remove_device does not crash, vpci_process_pending does
>>>>     though.
>>> Assume we are in an error state in vpci_process_pending *on one of the 
>>> vCPUs*
>>> and we call vpci_remove_device. vpci_remove_device tries to acquire the
>>> lock and it can't just because there are some other vpci code is running on 
>>> other vCPU.
>>> Then what do we do here? We are in SoftIRQ context now and we can't spin
>>> trying to acquire d->vpci_rwlock forever. Neither we can blindly free vpci
>>> structure because it is seen by all vCPUs and may crash them.
>>>
>>> If vpci_remove_device is in hypercall context it just returns -ERESTART and
>>> hypercall continuation helps here. But not in SoftIRQ context.
>> Maybe then you want to invoke this cleanup from RCU context (whether
>> vpci_remove_device() itself or a suitable clone there of is TBD)? (I
>> will admit though that I didn't check whether that would satisfy all
>> constraints.)
>>
>> Then again it also hasn't become clear to me why you use write_trylock()
>> there. The lock contention you describe doesn't, on the surface, look
>> any different from situations elsewhere.
> I use write_trylock in vpci_remove_device because if we can't
> acquire the lock then we defer device removal. This would work
> well if called from a hypercall which will employ hypercall continuation.
> But SoftIRQ getting -ERESTART is something that we can't probably
> handle by restarting as hypercall can, thus I only see that 
> vpci_process_pending
> will need to spin and wait until vpci_remove_device succeeds.
Does anybody have any better solution for preventing SoftIRQ from
spinning on vpci_remove_device and -ERESTART?
>> Jan
>>
> Thank you,
> Oleksandr
Thank you,
Oleksandr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.