[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot
On Thu, Nov 15, 2018 at 09:10:26AM +0800, Chao Gao wrote: > I find some pass-thru devices don't work any more across guest > reboot. Assigning it to another domain also meets the same issue. And > the only way to make it work again is un-binding and binding it to > pciback. Someone reported this issue one year ago [1]. How does unbinding and binding to pciback fix the issue? Is this due to Dom0 using some PV or Dom0 only hypercall that can reset the host_maskall state? > > If the device's driver doesn't disable MSI-X during shutdown or qemu is > killed/crashed before the domain shutdown, this domain's pirq won't be > unmapped. Then xen will unmap all pirq. But pciback has already disabled > meory decoding before xen unmapping pirq. Then when Xen is disabling a > MSI of the device, it has to sets the host_maskall flag and maskall bit > to mask a MSI rather than sets maskbit in MSI-x table. The call trace of > this process is: > ->arch_domain_destroy > ->free_domain_pirqs > ->unmap_domain_pirq (if pirq isn't unmap by qemu) > ->pirq_guest_force_unbind > ->__pirq_guest_unbind > ->mask_msi_irq(=desc->handler->disable()) > ->the warning in msi_set_mask_bit() > > The host_maskall bit will prevent guests from clearing the maskall bit > even the device is assigned to another guest later. Guests cannot > receive interrupts from this device. > > > To fix this, host_maskall flag is cleared when all MSIs of a device are freed. > It is definitely safely to clear it because no msi is actually set up > for this device. Also, 'msix->warned' is initialized to DOMID_INVALID > rather than 0 to avoid warnings missing for Dom0. > > [1]: > https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html > > Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx> > --- > xen/arch/x86/msi.c | 18 ++++++++++++++++++ > xen/drivers/passthrough/pci.c | 1 + > 2 files changed, 19 insertions(+) > > diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c > index 5567990..cd570bc 100644 > --- a/xen/arch/x86/msi.c > +++ b/xen/arch/x86/msi.c > @@ -637,6 +637,7 @@ int msi_free_irq(struct msi_desc *entry) > { > unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX > ? entry->msi.nvec : 1; > + struct pci_dev *pdev = entry->dev; > > while ( nr-- ) > { > @@ -654,6 +655,23 @@ int msi_free_irq(struct msi_desc *entry) > > list_del(&entry->list); > xfree(entry); > + > + /* > + * In some cases, the 'host_maskall' is set for safety. Here clear > + * 'host_maskall' if all MSIs are freed for a msi-x capable device. > + * Without it, the device's msix keeps disabled even been reassigned "... even after being reassigned ..." I think it's clearer. > + * to another domain. > + */ > + if ( pdev && list_empty(&pdev->msi_list) && pdev->msix ) > + { > + if ( pdev->msix->host_maskall ) > + printk(XENLOG_G_WARNING > + "Resetting msix status of %04x:%02x:%02x.%u\n", > + pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn), > + PCI_FUNC(pdev->devfn)); > + pdev->msix->host_maskall = false; > + pdev->msix->warned = DOMID_INVALID; > + } > return 0; IMO this looks quite similar to a msi_reset_state function or at least the start of it. Maybe it should be in a separate helper that sets all the fields to their initial values, with the expectation that Dom0 will perform the device reset? The underlying problem here AFAICT is that the Xen internal device state is not the same between device assignments. In any case there should be at least a note here pointing out that Xen expects the hardware domain to perform a device reset, so the Xen internal state actually matches the device state before trying to assign the device to another guest. Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |