[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal
On 18.11.21 16:04, Roger Pau Monné wrote: > Sorry, I've been quite busy with other staff. > > On Thu, Nov 18, 2021 at 01:48:06PM +0000, Oleksandr Andrushchenko wrote: >> >> On 18.11.21 15:25, Jan Beulich wrote: >>> On 18.11.2021 10:32, Oleksandr Andrushchenko wrote: >>>> On 18.11.21 11:15, Jan Beulich wrote: >>>>> On 18.11.2021 09:54, Oleksandr Andrushchenko wrote: >>>>>> On 18.11.21 10:36, Jan Beulich wrote: >>>>>>> On 18.11.2021 08:49, Oleksandr Andrushchenko wrote: >>>>>>>> On 17.11.21 10:28, Jan Beulich wrote: >>>>>>>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: >>>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >>>>>>>>>> >>>>>>>>>> When a vPCI is removed for a PCI device it is possible that we have >>>>>>>>>> scheduled a delayed work for map/unmap operations for that device. >>>>>>>>>> For example, the following scenario can illustrate the problem: >>>>>>>>>> >>>>>>>>>> pci_physdev_op >>>>>>>>>> pci_add_device >>>>>>>>>> init_bars -> modify_bars -> defer_map -> >>>>>>>>>> raise_softirq(SCHEDULE_SOFTIRQ) >>>>>>>>>> iommu_add_device <- FAILS >>>>>>>>>> vpci_remove_device -> xfree(pdev->vpci) >>>>>>>>>> >>>>>>>>>> leave_hypervisor_to_guest >>>>>>>>>> vpci_process_pending: v->vpci.mem != NULL; >>>>>>>>>> v->vpci.pdev->vpci == NULL >>>>>>>>>> >>>>>>>>>> For the hardware domain we continue execution as the worse that >>>>>>>>>> could happen is that MMIO mappings are left in place when the >>>>>>>>>> device has been deassigned >>>>>>>>>> >>>>>>>>>> For unprivileged domains that get a failure in the middle of a vPCI >>>>>>>>>> {un}map operation we need to destroy them, as we don't know in which >>>>>>>>>> state the p2m is. This can only happen in vpci_process_pending for >>>>>>>>>> DomUs as they won't be allowed to call pci_add_device. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Oleksandr Andrushchenko >>>>>>>>>> <oleksandr_andrushchenko@xxxxxxxx> >>>>>>>>> Thinking about it some more, I'm not convinced any of this is really >>>>>>>>> needed in the presented form. >>>>>>>> The intention of this patch was to handle error conditions which are >>>>>>>> abnormal, e.g. when iommu_add_device fails and we are in the middle >>>>>>>> of initialization. So, I am trying to cancel all pending work which >>>>>>>> might >>>>>>>> already be there and not to crash. >>>>>>> Only Dom0 may be able to prematurely access the device during "add". >>>>>>> Yet unlike for DomU-s we generally expect Dom0 to be well-behaved. >>>>>>> Hence I'm not sure I see the need for dealing with these. >>>>>> Probably I don't follow you here. The issue I am facing is Dom0 >>>>>> related, e.g. Xen was not able to initialize during "add" and thus >>>>>> wanted to clean up the leftovers. As the result the already >>>>>> scheduled work crashes as it was not neither canceled nor interrupted >>>>>> in some safe manner. So, this sounds like something we need to take >>>>>> care of, thus this patch. >>>>> But my point was the question of why there would be any pending work >>>>> in the first place in this case, when we expect Dom0 to be well-behaved. >>>> I am not saying Dom0 misbehaves here. This is my real use-case >>>> (as in the commit message): >>>> >>>> pci_physdev_op >>>> pci_add_device >>>> init_bars -> modify_bars -> defer_map -> >>>> raise_softirq(SCHEDULE_SOFTIRQ) >>>> iommu_add_device <- FAILS >>>> vpci_remove_device -> xfree(pdev->vpci) >>>> >>>> leave_hypervisor_to_guest >>>> vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == >>>> NULL >>> First of all I'm sorry for having lost track of that particular case in >>> the course of the discussion. >> No problem, I see on the ML how much you review every day... >>> I wonder though whether that's something we really need to take care of. >>> At boot (on x86) modify_bars() wouldn't call defer_map() anyway, but >>> use apply_map() instead. I wonder whether this wouldn't be appropriate >>> generally in the context of init_bars() when used for Dom0 (not sure >>> whether init_bars() would find some form of use for DomU-s as well). >>> This is even more so as it would better be the exception that devices >>> discovered post-boot start out with memory decoding enabled (which is a >>> prereq for modify_bars() to be called in the first place). >> Well, first of all init_bars is going to be used for DomUs as well: >> that was discussed previously and is reflected in this series. >> >> But the real use-case for the deferred mapping would be the one >> from PCI_COMMAND register write emulation: >> >> void vpci_cmd_write(const struct pci_dev *pdev, unsigned int reg, >> uint32_t cmd, void *data) >> { >> [snip] >> modify_bars(pdev, cmd, false); >> >> which in turn calls defer_map. >> >> I believe Roger did that for a good reason not to stall Xen while doing >> map/unmap (which may take quite some time) and moved that to >> vpci_process_pending and SOFTIRQ context. > Indeed. In the physdevop failure case this comes from an hypercall > context, so maybe you could do the mapping in place using hypercall > continuations if required. Not sure how complex that would be, > compared to just deferring to guest entry point and then dealing with > the possible cleanup on failure. This will solve one part of the equation: pci_physdev_op pci_add_device init_bars -> modify_bars -> defer_map -> raise_softirq(SCHEDULE_SOFTIRQ) iommu_add_device <- FAILS vpci_remove_device -> xfree(pdev->vpci) But what about the other one, e.g. vpci_process_pending is triggered in parallel with PCI device de-assign for example? > > Thanks, Roger. Thank you, Oleksandr
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |