[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Thu, 18 Nov 2021 14:14:36 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4VNnTl4tnYO2aRP/gEH87SetkkaUDK12tfHXudF23Ro=; b=TvWRo5kmWpZXRQ9LknOyjZnSlaUKMmP7wWoNkVvI9MjDlYTILzEHgNp3yVgr3rQQPBvelrtH0IzUJ/xH668YlqOCYyktOWXtp1vh/xic3eF3XHDcRNu8v6JTqeEG8L/c99px9Ri2beiiTYBcWrjAiIXMmDvy63a2AbCLmMJyoll8yLO+t2dlY6srEkiiwSx01/T9NUI0T9mzesA6BZKSPpCX4O4qLRr05DgNvt7GSRUNKDiZPZ/T99bZOUZKznciCU9E5EkVLepAhpkshTrktK+wH4g3LFu1Z2ViF+oCyWU8sMNCBptsMrSKimnXs21oGX/wMUewUJA1lkpqQaCSVQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZIfuQRKR7cdSZQmdczQZLV0fMFiVDneHSBW1e+uGud3phRsnc+ojIUe0EKYRPs5oX1H48CuKnMqlbyXrHgsJyl2uUlvb1pNNch+Ddy8pSdg+1TXwq81I+gtSw3DyUn9d2mIsCjG10rhPSU+f5Pv5aw/srS87UWvkCxG6MfqZvSHy0+LdkvC319hMKHS56zaHgcvMVH1z/2Cl0EfT7/k9VKKVozaXfMs41lKLA/UdtkojHe3QlyZfTY+naX2L8J4JzfpONF0c7uYlmzP6A4mio2J7lhBJBMrrHg93VedCDUuvPSQ4dh9W7gsFt55X9lJZk95E8DdIERVgr5zLaJGszg==
  • Cc: Jan Beulich <jbeulich@xxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 18 Nov 2021 14:15:03 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHX0hJIYALl/D9fL0OD6N0XGDJh2awHdhMAgAGHTYCAAA1EAIAABOkAgAAF6YCAAATnAIAAQRQAgAAGPgCAAAR1gIAAAvIA
  • Thread-topic: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal


On 18.11.21 16:04, Roger Pau Monné wrote:
> Sorry, I've been quite busy with other staff.
>
> On Thu, Nov 18, 2021 at 01:48:06PM +0000, Oleksandr Andrushchenko wrote:
>>
>> On 18.11.21 15:25, Jan Beulich wrote:
>>> On 18.11.2021 10:32, Oleksandr Andrushchenko wrote:
>>>> On 18.11.21 11:15, Jan Beulich wrote:
>>>>> On 18.11.2021 09:54, Oleksandr Andrushchenko wrote:
>>>>>> On 18.11.21 10:36, Jan Beulich wrote:
>>>>>>> On 18.11.2021 08:49, Oleksandr Andrushchenko wrote:
>>>>>>>> On 17.11.21 10:28, Jan Beulich wrote:
>>>>>>>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>>>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>>>>>>>>
>>>>>>>>>> When a vPCI is removed for a PCI device it is possible that we have
>>>>>>>>>> scheduled a delayed work for map/unmap operations for that device.
>>>>>>>>>> For example, the following scenario can illustrate the problem:
>>>>>>>>>>
>>>>>>>>>> pci_physdev_op
>>>>>>>>>>         pci_add_device
>>>>>>>>>>             init_bars -> modify_bars -> defer_map -> 
>>>>>>>>>> raise_softirq(SCHEDULE_SOFTIRQ)
>>>>>>>>>>         iommu_add_device <- FAILS
>>>>>>>>>>         vpci_remove_device -> xfree(pdev->vpci)
>>>>>>>>>>
>>>>>>>>>> leave_hypervisor_to_guest
>>>>>>>>>>         vpci_process_pending: v->vpci.mem != NULL; 
>>>>>>>>>> v->vpci.pdev->vpci == NULL
>>>>>>>>>>
>>>>>>>>>> For the hardware domain we continue execution as the worse that
>>>>>>>>>> could happen is that MMIO mappings are left in place when the
>>>>>>>>>> device has been deassigned
>>>>>>>>>>
>>>>>>>>>> For unprivileged domains that get a failure in the middle of a vPCI
>>>>>>>>>> {un}map operation we need to destroy them, as we don't know in which
>>>>>>>>>> state the p2m is. This can only happen in vpci_process_pending for
>>>>>>>>>> DomUs as they won't be allowed to call pci_add_device.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Oleksandr Andrushchenko 
>>>>>>>>>> <oleksandr_andrushchenko@xxxxxxxx>
>>>>>>>>> Thinking about it some more, I'm not convinced any of this is really
>>>>>>>>> needed in the presented form.
>>>>>>>> The intention of this patch was to handle error conditions which are
>>>>>>>> abnormal, e.g. when iommu_add_device fails and we are in the middle
>>>>>>>> of initialization. So, I am trying to cancel all pending work which 
>>>>>>>> might
>>>>>>>> already be there and not to crash.
>>>>>>> Only Dom0 may be able to prematurely access the device during "add".
>>>>>>> Yet unlike for DomU-s we generally expect Dom0 to be well-behaved.
>>>>>>> Hence I'm not sure I see the need for dealing with these.
>>>>>> Probably I don't follow you here. The issue I am facing is Dom0
>>>>>> related, e.g. Xen was not able to initialize during "add" and thus
>>>>>> wanted to clean up the leftovers. As the result the already
>>>>>> scheduled work crashes as it was not neither canceled nor interrupted
>>>>>> in some safe manner. So, this sounds like something we need to take
>>>>>> care of, thus this patch.
>>>>> But my point was the question of why there would be any pending work
>>>>> in the first place in this case, when we expect Dom0 to be well-behaved.
>>>> I am not saying Dom0 misbehaves here. This is my real use-case
>>>> (as in the commit message):
>>>>
>>>> pci_physdev_op
>>>>         pci_add_device
>>>>             init_bars -> modify_bars -> defer_map -> 
>>>> raise_softirq(SCHEDULE_SOFTIRQ)
>>>>         iommu_add_device <- FAILS
>>>>         vpci_remove_device -> xfree(pdev->vpci)
>>>>
>>>> leave_hypervisor_to_guest
>>>>         vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == 
>>>> NULL
>>> First of all I'm sorry for having lost track of that particular case in
>>> the course of the discussion.
>> No problem, I see on the ML how much you review every day...
>>> I wonder though whether that's something we really need to take care of.
>>> At boot (on x86) modify_bars() wouldn't call defer_map() anyway, but
>>> use apply_map() instead. I wonder whether this wouldn't be appropriate
>>> generally in the context of init_bars() when used for Dom0 (not sure
>>> whether init_bars() would find some form of use for DomU-s as well).
>>> This is even more so as it would better be the exception that devices
>>> discovered post-boot start out with memory decoding enabled (which is a
>>> prereq for modify_bars() to be called in the first place).
>> Well, first of all init_bars is going to be used for DomUs as well:
>> that was discussed previously and is reflected in this series.
>>
>> But the real use-case for the deferred mapping would be the one
>> from PCI_COMMAND register write emulation:
>>
>> void vpci_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>                       uint32_t cmd, void *data)
>> {
>> [snip]
>>           modify_bars(pdev, cmd, false);
>>
>> which in turn calls defer_map.
>>
>> I believe Roger did that for a good reason not to stall Xen while doing
>> map/unmap (which may take quite some time) and moved that to
>> vpci_process_pending and SOFTIRQ context.
> Indeed. In the physdevop failure case this comes from an hypercall
> context, so maybe you could do the mapping in place using hypercall
> continuations if required. Not sure how complex that would be,
> compared to just deferring to guest entry point and then dealing with
> the possible cleanup on failure.
This will solve one part of the equation:

pci_physdev_op
        pci_add_device
            init_bars -> modify_bars -> defer_map -> 
raise_softirq(SCHEDULE_SOFTIRQ)
        iommu_add_device <- FAILS
        vpci_remove_device -> xfree(pdev->vpci)

But what about the other one, e.g. vpci_process_pending is triggered in
parallel with PCI device de-assign for example?

>
> Thanks, Roger.
Thank you,
Oleksandr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.