[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen/pt: fix some pass-thru devices don't work across reboot
On Fri, Nov 09, 2018 at 02:14:04AM -0700, Jan Beulich wrote: >>>> On 09.11.18 at 01:11, <chao.gao@xxxxxxxxx> wrote: >> I find some pass-thru devices don't work any more across guest >> reboot. Assigning it to another domain also meets the same issue. And >> the only way to make it work again is un-binding and binding it to >> pciback. Someone reported this issue one year ago [1]. > >I find "some" above and in the title misleading. According to the >following description, the issue ought to affect exactly all MSI-X >capable devices. Some devices whose driver doesn't disable MSI-X when shutdown. But Xen can't rely on the guest driver to do this. That's why I think this issue should be handled in Xen. If QEMU is killed/crashed before destroying a VM, all MSI-x capable devices would suffer the same issue. > >> The root cause is that xen sets the maskall bit in MSI-x capability >> during guest shutdown. > >That's in __pci_disable_msix()? Please help readers by being >specific. I think the call trace is: ->arch_domain_destroy ->unmap_domain_pirq (if guest doesn't disable MSI-x) ->pirq_guest_force_unbind ->__pirq_guest_unbind ->mask_msi_irq(=desc->handler->disable()) ->the warning in msi_set_mask_bit() > >> @@ -1439,7 +1440,27 @@ static int assign_device(struct domain *d, u16 seg, >> u8 bus, u8 devfn, u32 flag) >> } >> >> if ( pdev->msix ) >> + { >> + uint8_t slot = PCI_SLOT(devfn), func = PCI_FUNC(devfn); >> + uint8_t pos = pci_find_cap_offset(seg, bus, slot, func, >> + PCI_CAP_ID_MSIX); >> + uint16_t control = pci_conf_read16(seg, bus, slot, func, >> + msix_control_reg(pos)); >> + uint16_t new_control; >> + >> + /* Reset status owned by Xen */ >> + pdev->msix->host_maskall = false; >> + pdev->msix->warned = DOMID_INVALID; >> + >> + /* Update 'maskall' bit in MSI-X Capability */ >> + new_control = (control & ~PCI_MSIX_FLAGS_MASKALL) | >> + pdev->msix->guest_maskall; >> + if ( new_control != control ) >> + pci_conf_write16(seg, bus, slot, func, msix_control_reg(pos), >> + control); >> + >> msixtbl_init(d); >> + } > >First of all, all maskall bit management is in x86/msi.c. Please keep it >that way, by introducing a function there, called from here (if need >be, read on). Will do >Next I'm weary of you clearing the device's maskall >bit without also clearing the enable bit (or ASSERT()ing that it's >cleared, if that's guaranteed). I don't get why claring maskall bit without clearing enable bit would be an issue. >I also don't follow why you OR in >->guest_maskall: Isn't it supposed to be clear anyway? Guest's first write to msixctrl register would override this value. So don't clear it is also fine. Considering that do_pci_add() issues QMP command to add pci device to qemu prior to xc_assign_device(), the question here is whether there is any chance qemu's first write has happened. > >From a more general perspective, forcing ->host_maskall to true >in msi_set_mask_bit() when memory decoding is disabled is a >safety measure. I'd like to see justification (in the description) why >it is safe to clear the bit again at a later point. Currently, no idea how to prove it. My throught is simple: make sure all status is benign. The host_maskall bit is set due to some actions of the last owner. Clear it to avoid it affecting the new domain. >Thing is that _if_ it >is safe to clear the bit when assigning the device to another guest, >why wouldn't it even more so be safe to do so when assigning it >back to Dom0 (i.e. in deassign_device())? I considered it. But deassign_device() happens before destroying domain during which host_maskall is set. > >And why would it not be safe to clear the bit perhaps already at >the point MSI-X gets disabled? Yes, I also thought it might be an option. But doing this definitely conflicts with the intention of the first if() in __pci_disable_msix() and msi_set_mask_bit() which is trying to enable MSI-x to access MSI-x table to set the per-vector mask bit. If it is safe, we don't need that also. Thanks Chao >It would seem to me that >_pci_cleanup_msix() could call msix_set_enable(, false), and >msix_set_enable() might legitimately clear host_maskall in that >case (but please double check; another possibility could be that >pci_cleanup_msi() needs a second call site added somewhere). >In the end the state after disabling MSI-X should match that >before enabling it the first time. > >Jan > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |