[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot
On Thu, Dec 20, 2018 at 10:46:29AM +0800, Chao Gao wrote: > On Wed, Dec 19, 2018 at 09:57:51AM +0100, Roger Pau Monné wrote: > >On Tue, Dec 18, 2018 at 10:43:37PM +0800, Chao Gao wrote: > >> I find some pass-thru devices don't work any more across guest > >> reboot. Assigning it to another domain also meets the same issue. And > >> the only way to make it work again is un-binding and binding it to > >> pciback. Someone reported this issue one year ago [1]. > >> > >> If the device's driver doesn't disable MSI-X during shutdown or qemu is > >> killed/crashed before the domain shutdown, this domain's pirq won't be > >> unmapped. Then xen takes over this work, unmapping all pirq-s, when > >> destroying guest. But as pciback has already disabled meory decoding before > >> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit > >> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of > >> this process is: > >> > >> ->arch_domain_destroy > >> ->free_domain_pirqs > >> ->unmap_domain_pirq (if pirq isn't unmapped by qemu) > >> ->pirq_guest_force_unbind > >> ->__pirq_guest_unbind > >> ->mask_msi_irq(=desc->handler->disable()) > >> ->the warning in msi_set_mask_bit() > >> > >> The host_maskall bit will prevent guests from clearing the maskall bit > >> even the device is assigned to another guest later. Then guests cannot > >> receive MSIs from this device. > >> > >> To fix this issue, a pirq is unmapped before memory decoding is disabled by > >> pciback. Specifically, when a device is detached from a guest, all > >> established > >> mappings between pirq and msi are destroying before changing the ownership. > >> > >> [1]: > >> https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html > >> > >> Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx> > >> --- > >> Applied this patch, qemu would report the error below: > >> [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, > >> pirq: 302, gvec: 0xd5) > >> [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, > >> pirq: 301, gvec: 0xe5) > >> [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, > >> pirq: 359, gvec: 0x41) > >> [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, > >> pirq: 358, gvec: 0x51) > >> > >> Despite of the error, guest shutdown or device hotplug finishs smoothly. > >> It seems to me that qemu tries to unbind a msi which is already unbound by > >> the code added by this patch. I am not sure whether it is acceptable to > >> leave this error there. > > > >So QEMU would try to unmap IRQs after unbinding the device? I think > > It seems to me yes. I don't know the reason right now. maybe because it > is an asynchronous process? > > >QEMU should be fixed to first unmap the IRQs and then unbind the > >device. > > Yes. Agree. > > > > >As long as this doesn't affect QEMU functionality I guess the Xen side > >can be committed, but ideally a QEMU patch to avoid those error > >messages should be committed at the same time. > > > >> --- > >> xen/drivers/passthrough/io.c | 57 > >> +++++++++++++++++++++++++++++-------------- > >> xen/drivers/passthrough/pci.c | 49 +++++++++++++++++++++++++++++++++++++ > >> xen/include/xen/iommu.h | 1 + > >> 3 files changed, 89 insertions(+), 18 deletions(-) > >> > >> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c > >> index a6eb8a4..56ee1ef 100644 > >> --- a/xen/drivers/passthrough/io.c > >> +++ b/xen/drivers/passthrough/io.c > >> @@ -619,6 +619,42 @@ int pt_irq_create_bind( > >> return 0; > >> } > >> > >> +static void pt_irq_destroy_bind_common(struct domain *d, struct pirq > >> *pirq) > >> +{ > >> + struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq); > >> + > >> + ASSERT(spin_is_locked(&d->event_lock)); > >> + > >> + if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) && > >> + list_empty(&pirq_dpci->digl_list) ) > >> + { > >> + pirq_guest_unbind(d, pirq); > >> + msixtbl_pt_unregister(d, pirq); > >> + if ( pt_irq_need_timer(pirq_dpci->flags) ) > >> + kill_timer(&pirq_dpci->timer); > >> + pirq_dpci->flags = 0; > >> + /* > >> + * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the > >> + * call to pt_pirq_softirq_reset. > >> + */ > >> + pt_pirq_softirq_reset(pirq_dpci); > >> + > >> + pirq_cleanup_check(pirq, d); > >> + } > >> +} > >> + > >> +void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq) > >> +{ > >> + struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq); > >> + > >> + ASSERT(spin_is_locked(&d->event_lock)); > >> + > >> + if ( pirq_dpci && pirq_dpci->gmsi.posted ) > >> + pi_update_irte(NULL, pirq, 0); > >> + > >> + pt_irq_destroy_bind_common(d, pirq); > >> +} > >> + > >> int pt_irq_destroy_bind( > >> struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind) > >> { > >> @@ -727,26 +763,11 @@ int pt_irq_destroy_bind( > >> } > >> else > >> what = "bogus"; > >> - } > >> - else if ( pirq_dpci && pirq_dpci->gmsi.posted ) > >> - pi_update_irte(NULL, pirq, 0); > >> - > >> - if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) && > >> - list_empty(&pirq_dpci->digl_list) ) > >> - { > >> - pirq_guest_unbind(d, pirq); > >> - msixtbl_pt_unregister(d, pirq); > >> - if ( pt_irq_need_timer(pirq_dpci->flags) ) > >> - kill_timer(&pirq_dpci->timer); > >> - pirq_dpci->flags = 0; > >> - /* > >> - * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the > >> - * call to pt_pirq_softirq_reset. > >> - */ > >> - pt_pirq_softirq_reset(pirq_dpci); > >> > >> - pirq_cleanup_check(pirq, d); > >> + pt_irq_destroy_bind_common(d, pirq); > >> } > >> + else > >> + pt_irq_destroy_bind_msi(d, pirq); > >> > >> spin_unlock(&d->event_lock); > >> > >> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c > >> index 1277ce2..88a8007 100644 > >> --- a/xen/drivers/passthrough/pci.c > >> +++ b/xen/drivers/passthrough/pci.c > >> @@ -368,6 +368,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg > >> *pseg, u8 bus, u8 devfn) > >> return NULL; > >> } > >> spin_lock_init(&msix->table_lock); > >> + msix->warned = DOMID_INVALID; > >> pdev->msix = msix; > >> } > >> > >> @@ -1514,6 +1515,52 @@ static int assign_device(struct domain *d, u16 seg, > >> u8 bus, u8 devfn, u32 flag) > >> return rc; > >> } > >> > >> +/* > >> + * Unmap established mappings between domain's pirq and device's MSI. > >> + * These mappings were set up by qemu/guest and are expected to be > >> + * destroyed when changing the device's ownership. > >> + */ > >> +static void pci_unmap_msi(struct pci_dev *pdev) > >> +{ > >> + struct msi_desc *entry, *tmp; > >> + > >> + ASSERT(pcidevs_locked()); > >> + > >> + if ( !pdev->domain ) > >> + return; > >> + > >> + spin_lock(&pdev->domain->event_lock); > >> + list_for_each_entry_safe( entry, tmp, &pdev->msi_list, list ) > > > >Do you really need the _safe version here? Couldn't you even use: > > Don't need the _safe version. > > > > >while ( (entry = list_first_entry_or_null(...)) != NULL ) > >... > > I think it is the same with list_for_each_entry(). Any reason makes you think > this one would be better? Doesn't 'entry' get freed when you call unmap_domain_pirq? In which case using the list pointer from that struct would be a use-after-free. Using list_first_entry_or_null you don't need the previous entry in order to get the next one, since you always pick the first one until the list is empty. > > > >> + { > >> + struct pirq *info; > >> + struct hvm_pirq_dpci *pirq_dpci; > >> + int pirq = domain_irq_to_pirq(pdev->domain, entry->irq), > >> pirq_orig; > >> + > >> + pirq_orig = pirq; > >> + > >> + if ( !pirq ) > >> + continue; > >> + > >> + /* For forcibly unmapped pirq, lookup radix tree with absolute > >> value */ > >> + if ( pirq < 0) > >> + pirq = -pirq; > > > >I'm not sure I follow, the pirq hasn't been unmapped at this point > >yet? > > Qemu (i.e. compromised qemu) has the ability to do this. Right? we can't > assert the pirq hasn't been unmapped here. If the PIRQ is unmapped then the 'entry' would also be gone (freed) AFAICT (see unmap_domain_pirq which calls msi_free_irq)? I think that any entry in pdev->msi_list will always have entry->irq >= 0, but maybe I'm missing something. AFAICT map_domain_pirq will not add an entry with irq < 0. Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |