[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device
On Fri, Jan 17, 2020 at 01:57:43PM -0500, Rich Persaud wrote: >On Sep 26, 2019, at 06:17, Pasi Kärkkäinen <pasik@xxxxxx> wrote: >> >> Hello Stanislav, >> >>> On Fri, Sep 13, 2019 at 11:28:20PM +0800, Chao Gao wrote: >>>> On Fri, Sep 13, 2019 at 10:02:24AM +0000, Spassov, Stanislav wrote: >>>> On Thu, Dec 13, 2018 at 07:54, Chao Gao wrote: >>>>> On Thu, Dec 13, 2018 at 12:54:52AM -0700, Jan Beulich wrote: >>>>>>>>> On 13.12.18 at 04:46, <chao.gao@xxxxxxxxx> wrote: >>>>>>> On Wed, Dec 12, 2018 at 08:21:39AM -0700, Jan Beulich wrote: >>>>>>>>>>> On 12.12.18 at 16:18, <chao.gao@xxxxxxxxx> wrote: >>>>>>>>> On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote: >>>>>>>>>>>>> On 12.12.18 at 08:06, <chao.gao@xxxxxxxxx> wrote: >>>>>>>>>>> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote: >>>>>>>>>>>> On 12/5/18 4:32 AM, Roger Pau Monné wrote: >>>>>>>>>>>>> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote: >>>>>>>>>>>>>> I find some pass-thru devices don't work any more across guest >>>>>>>>>>>>>> reboot. >>>>>>>>>>>>>> Assigning it to another guest also meets the same issue. And the >>>>>>>>>>>>>> only >>>>>>>>>>>>>> way to make it work again is un-binding and binding it to >>>>>>>>>>>>>> pciback. >>>>>>>>>>>>>> Someone reported this issue one year ago [1]. More detail also >>>>>>>>>>>>>> can be >>>>>>>>>>>>>> found in [2]. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The root-cause is Xen's internal MSI-X state isn't reset properly >>>>>>>>>>>>>> during reboot or re-assignment. In the above case, Xen set >>>>>>>>>>>>>> maskall bit >>>>>>>>>>>>>> to mask all MSI interrupts after it detected a potential security >>>>>>>>>>>>>> issue. Even after device reset, Xen didn't reset its internal >>>>>>>>>>>>>> maskall >>>>>>>>>>>>>> bit. As a result, maskall bit would be set again in next write to >>>>>>>>>>>>>> MSI-X message control register. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting >>>>>>>>>>>>>> MSI-X >>>>>>>>>>>>>> internal state of a device, we employ it to fix this issue >>>>>>>>>>>>>> rather than >>>>>>>>>>>>>> introducing another dedicated sub-hypercall. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Note that PHYSDEVOPS_release_msix() will fail if the mapping >>>>>>>>>>>>>> between >>>>>>>>>>>>>> the device's msix and pirq has been created. This limitation >>>>>>>>>>>>>> prevents >>>>>>>>>>>>>> us calling this function when detaching a device from a guest >>>>>>>>>>>>>> during >>>>>>>>>>>>>> guest shutdown. Thus it is called right before calling >>>>>>>>>>>>>> PHYSDEVOPS_prepare_msix(). >>>>>>>>>>>>> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop >>>>>>>>>>>>> the >>>>>>>>>>>>> () at the end of the hypercall name since it's not a function. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm also wondering why the release can't be done when the device >>>>>>>>>>>>> is >>>>>>>>>>>>> detached from the guest (or the guest has been shut down). This >>>>>>>>>>>>> makes >>>>>>>>>>>>> me worry about the raciness of the attach/detach procedure: if >>>>>>>>>>>>> there's >>>>>>>>>>>>> a state where pciback assumes the device has been detached from >>>>>>>>>>>>> the >>>>>>>>>>>>> guest, but there are still pirqs bound, an attempt to attach to >>>>>>>>>>>>> another guest in such state will fail. >>>>>>>>>>>> >>>>>>>>>>>> I wonder whether this additional reset functionality could be done >>>>>>>>>>>> out >>>>>>>>>>>> of xen_pcibk_xenbus_remove(). We first do a (best effort) device >>>>>>>>>>>> reset >>>>>>>>>>>> and then do the extra things that are not properly done there. >>>>>>>>>>> >>>>>>>>>>> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying >>>>>>>>>>> the handler of PHYSDEVOP_release_msix. To do a successful Xen >>>>>>>>>>> internal >>>>>>>>>>> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be >>>>>>>>>>> finished >>>>>>>>>>> without error. But ATM, xen expects that no msi is bound to pirq >>>>>>>>>>> when >>>>>>>>>>> doing PHYSDEVOP_release_msix. Otherwise it fails with error code >>>>>>>>>>> -EBUSY. >>>>>>>>>>> However, the expectation isn't guaranteed in >>>>>>>>>>> xen_pcibk_xenbus_remove(). >>>>>>>>>>> In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen >>>>>>>>>>> at last minute, which happens after device reset in >>>>>>>>>>> xen_pcibk_xenbus_remove(). >>>>>>>>>> >>>>>>>>>> But that may need taking care of: I don't think it is a good idea to >>>>>>>>>> have >>>>>>>>>> anything left from the prior owning domain when the device gets >>>>>>>>>> reset. >>>>>>>>>> I.e. left over IRQ bindings should perhaps be forcibly cleared before >>>>>>>>>> invoking the reset; >>>>>>>>> >>>>>>>>> Agree. How about pciback to track the established IRQ bindings? Then >>>>>>>>> pciback can clear irq binding before invoking the reset. >>>>>>>> >>>>>>>> How would pciback even know of those mappings, when it's qemu >>>>>>>> who establishes (and manages) them? >>>>>>> >>>>>>> I meant to expose some interfaces from pciback. And pciback serves >>>>>>> as the proxy of IRQ (un)binding APIs. >>>>>> >>>>>> If at all possible we should avoid having to change more parties (qemu, >>>>>> libxc, kernel, hypervisor) than really necessary. Remember that such >>>>>> a bug fix may want backporting, and making sure affected people have >>>>>> all relevant components updated is increasingly difficult with their >>>>>> number growing. >>>>>> >>>>>>>>>> in fact I'd expect this to happen in the course of >>>>>>>>>> domain destruction, and I'd expect the device reset to come after the >>>>>>>>>> domain was cleaned up. Perhaps simply an ordering issue in the tool >>>>>>>>>> stack? >>>>>>>>> >>>>>>>>> I don't think reversing the sequences of device reset and domain >>>>>>>>> destruction would be simple. Furthermore, during device hot-unplug, >>>>>>>>> device reset is done when the owner is alive. So if we use domain >>>>>>>>> destruction to enforce all irq binding cleared, in theory, it won't be >>>>>>>>> applicable to hot-unplug case (if qemu's hot-unplug logic is >>>>>>>>> compromised). >>>>>>>> >>>>>>>> Even in the hot-unplug case the tool stack could issue unbind >>>>>>>> requests, behind the back of the possibly compromised qemu, >>>>>>>> once neither the guest nor qemu have access to the device >>>>>>>> anymore. >>>>>>> >>>>>>> But currently, tool stack doesn't know the remaining IRQ bindings. >>>>>>> If tool stack can maintaine IRQ binding information of a pass-thru >>>>>>> device (stored in Xenstore?), we can come up with a clean solution >>>>>>> without modifying linux kernel and Xen. >>>>>> >>>>>> If there's no way for the tool stack to either find out the bindings >>>>>> or "blindly" issue unbind requests (accepting them to fail), then a >>>>>> "wildcard" unbind operation may want adding. Or, perhaps even >>>>>> better, XEN_DOMCTL_deassign_device could unbind anything left >>>>>> in place for the specified device. >>>>> >>>>> Good idea. I will take this advice. >>>>> >>>>> Thanks >>>>> Chao >>>> >>>> I am having the same issue, and cannot find a fix in either xen-pciback or >>>> the Xen codebase. >>>> Was a solution ever pushed as a result of this thread? >>>> >>> >>> I submitted patches [1] to Xen community. But I didn't get it merged. >>> We made a change in device driver to disable MSI-X during guest OS >>> shutdown to mitigate the issue. But when guest or qemu was crashed, we >>> encountered this issue again. I have no plan to get back to these >>> patches. But if you want to fix the issue completely along what the >>> patches below did, please go ahead. >>> >>> [1]: >>> https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg01227.html >>> >>> Thanks >>> Chao >>> >> >> Stanislav: Are you able to continue the work with these patches, to get them >> merged? > >What further work is needed for these patches? Are they only needed for Intel >i210 NIC PCI passthrough, or are other devices affected? All MSI-X capable devices were affected. This issue is fixed in Xen by Roger's patch (https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=575e18d54d19eda787f6477a4acd3c50f72751a9). Thanks Chao _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |