[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot



On Thu, Dec 20, 2018 at 10:46:29AM +0800, Chao Gao wrote:
> On Wed, Dec 19, 2018 at 09:57:51AM +0100, Roger Pau Monné wrote:
> >On Tue, Dec 18, 2018 at 10:43:37PM +0800, Chao Gao wrote:
> >> I find some pass-thru devices don't work any more across guest
> >> reboot. Assigning it to another domain also meets the same issue. And
> >> the only way to make it work again is un-binding and binding it to
> >> pciback. Someone reported this issue one year ago [1].
> >> 
> >> If the device's driver doesn't disable MSI-X during shutdown or qemu is
> >> killed/crashed before the domain shutdown, this domain's pirq won't be
> >> unmapped. Then xen takes over this work, unmapping all pirq-s, when
> >> destroying guest. But as pciback has already disabled meory decoding before
> >> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
> >> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
> >> this process is:
> >> 
> >> ->arch_domain_destroy
> >>     ->free_domain_pirqs
> >>         ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
> >>             ->pirq_guest_force_unbind
> >>                 ->__pirq_guest_unbind
> >>                     ->mask_msi_irq(=desc->handler->disable())
> >>                         ->the warning in msi_set_mask_bit()
> >> 
> >> The host_maskall bit will prevent guests from clearing the maskall bit
> >> even the device is assigned to another guest later. Then guests cannot
> >> receive MSIs from this device.
> >> 
> >> To fix this issue, a pirq is unmapped before memory decoding is disabled by
> >> pciback. Specifically, when a device is detached from a guest, all 
> >> established
> >> mappings between pirq and msi are destroying before changing the ownership.
> >> 
> >> [1]: 
> >> https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
> >> 
> >> Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx>
> >> ---
> >> Applied this patch, qemu would report the error below:
> >>     [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, 
> >> pirq: 302, gvec: 0xd5)
> >>     [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, 
> >> pirq: 301, gvec: 0xe5)
> >>     [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, 
> >> pirq: 359, gvec: 0x41)
> >>     [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, 
> >> pirq: 358, gvec: 0x51)
> >> 
> >> Despite of the error, guest shutdown or device hotplug finishs smoothly.
> >> It seems to me that qemu tries to unbind a msi which is already unbound by
> >> the code added by this patch. I am not sure whether it is acceptable to
> >> leave this error there.
> >
> >So QEMU would try to unmap IRQs after unbinding the device? I think
> 
> It seems to me yes. I don't know the reason right now. maybe because it
> is an asynchronous process?
> 
> >QEMU should be fixed to first unmap the IRQs and then unbind the
> >device.
> 
> Yes. Agree.
> 
> >
> >As long as this doesn't affect QEMU functionality I guess the Xen side
> >can be committed, but ideally a QEMU patch to avoid those error
> >messages should be committed at the same time.
> >
> >> ---
> >>  xen/drivers/passthrough/io.c  | 57 
> >> +++++++++++++++++++++++++++++--------------
> >>  xen/drivers/passthrough/pci.c | 49 +++++++++++++++++++++++++++++++++++++
> >>  xen/include/xen/iommu.h       |  1 +
> >>  3 files changed, 89 insertions(+), 18 deletions(-)
> >> 
> >> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> >> index a6eb8a4..56ee1ef 100644
> >> --- a/xen/drivers/passthrough/io.c
> >> +++ b/xen/drivers/passthrough/io.c
> >> @@ -619,6 +619,42 @@ int pt_irq_create_bind(
> >>      return 0;
> >>  }
> >>  
> >> +static void pt_irq_destroy_bind_common(struct domain *d, struct pirq 
> >> *pirq)
> >> +{
> >> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
> >> +
> >> +    ASSERT(spin_is_locked(&d->event_lock));
> >> +
> >> +    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
> >> +         list_empty(&pirq_dpci->digl_list) )
> >> +    {
> >> +        pirq_guest_unbind(d, pirq);
> >> +        msixtbl_pt_unregister(d, pirq);
> >> +        if ( pt_irq_need_timer(pirq_dpci->flags) )
> >> +            kill_timer(&pirq_dpci->timer);
> >> +        pirq_dpci->flags = 0;
> >> +        /*
> >> +         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
> >> +         * call to pt_pirq_softirq_reset.
> >> +         */
> >> +        pt_pirq_softirq_reset(pirq_dpci);
> >> +
> >> +        pirq_cleanup_check(pirq, d);
> >> +    }
> >> +}
> >> +
> >> +void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
> >> +{
> >> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
> >> +
> >> +    ASSERT(spin_is_locked(&d->event_lock));
> >> +
> >> +    if ( pirq_dpci && pirq_dpci->gmsi.posted )
> >> +        pi_update_irte(NULL, pirq, 0);
> >> +
> >> +    pt_irq_destroy_bind_common(d, pirq);
> >> +}
> >> +
> >>  int pt_irq_destroy_bind(
> >>      struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
> >>  {
> >> @@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
> >>          }
> >>          else
> >>              what = "bogus";
> >> -    }
> >> -    else if ( pirq_dpci && pirq_dpci->gmsi.posted )
> >> -        pi_update_irte(NULL, pirq, 0);
> >> -
> >> -    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
> >> -         list_empty(&pirq_dpci->digl_list) )
> >> -    {
> >> -        pirq_guest_unbind(d, pirq);
> >> -        msixtbl_pt_unregister(d, pirq);
> >> -        if ( pt_irq_need_timer(pirq_dpci->flags) )
> >> -            kill_timer(&pirq_dpci->timer);
> >> -        pirq_dpci->flags = 0;
> >> -        /*
> >> -         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
> >> -         * call to pt_pirq_softirq_reset.
> >> -         */
> >> -        pt_pirq_softirq_reset(pirq_dpci);
> >>  
> >> -        pirq_cleanup_check(pirq, d);
> >> +        pt_irq_destroy_bind_common(d, pirq);
> >>      }
> >> +    else
> >> +        pt_irq_destroy_bind_msi(d, pirq);
> >>  
> >>      spin_unlock(&d->event_lock);
> >>  
> >> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> >> index 1277ce2..88a8007 100644
> >> --- a/xen/drivers/passthrough/pci.c
> >> +++ b/xen/drivers/passthrough/pci.c
> >> @@ -368,6 +368,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg 
> >> *pseg, u8 bus, u8 devfn)
> >>              return NULL;
> >>          }
> >>          spin_lock_init(&msix->table_lock);
> >> +        msix->warned = DOMID_INVALID;
> >>          pdev->msix = msix;
> >>      }
> >>  
> >> @@ -1514,6 +1515,52 @@ static int assign_device(struct domain *d, u16 seg, 
> >> u8 bus, u8 devfn, u32 flag)
> >>      return rc;
> >>  }
> >>  
> >> +/*
> >> + * Unmap established mappings between domain's pirq and device's MSI.
> >> + * These mappings were set up by qemu/guest and are expected to be
> >> + * destroyed when changing the device's ownership.
> >> + */
> >> +static void pci_unmap_msi(struct pci_dev *pdev)
> >> +{
> >> +    struct msi_desc *entry, *tmp;
> >> +
> >> +    ASSERT(pcidevs_locked());
> >> +
> >> +    if ( !pdev->domain )
> >> +        return;
> >> +
> >> +    spin_lock(&pdev->domain->event_lock);
> >> +    list_for_each_entry_safe( entry, tmp, &pdev->msi_list, list )
> >
> >Do you really need the _safe version here? Couldn't you even use:
> 
> Don't need the _safe version.
> 
> >
> >while ( (entry = list_first_entry_or_null(...)) != NULL )
> >...
> 
> I think it is the same with list_for_each_entry(). Any reason makes you think
> this one would be better?

Doesn't 'entry' get freed when you call unmap_domain_pirq? In which
case using the list pointer from that struct would be a
use-after-free.

Using list_first_entry_or_null you don't need the previous entry in
order to get the next one, since you always pick the first one until
the list is empty.

> >
> >> +    {
> >> +        struct pirq *info;
> >> +        struct hvm_pirq_dpci *pirq_dpci;
> >> +        int pirq = domain_irq_to_pirq(pdev->domain, entry->irq), 
> >> pirq_orig;
> >> +
> >> +        pirq_orig = pirq;
> >> +
> >> +        if ( !pirq )
> >> +            continue;
> >> +
> >> +        /* For forcibly unmapped pirq, lookup radix tree with absolute 
> >> value */
> >> +        if ( pirq < 0)
> >> +            pirq = -pirq;
> >
> >I'm not sure I follow, the pirq hasn't been unmapped at this point
> >yet?
> 
> Qemu (i.e. compromised qemu) has the ability to do this. Right? we can't
> assert the pirq hasn't been unmapped here.

If the PIRQ is unmapped then the 'entry' would also be gone (freed)
AFAICT (see unmap_domain_pirq which calls msi_free_irq)?

I think that any entry in pdev->msi_list will always have entry->irq
 >= 0, but maybe I'm missing something. AFAICT map_domain_pirq will not
add an entry with irq < 0.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.