|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
On January 06, 2016 7:26 PM, <quan.xu@xxxxxxxxx> wrote:
> > > diff --git a/xen/drivers/passthrough/vtd/qinval.c
> > > b/xen/drivers/passthrough/vtd/qinval.c
> > > index b227e4e..7330c5d 100644
> > > --- a/xen/drivers/passthrough/vtd/qinval.c
> > > +++ b/xen/drivers/passthrough/vtd/qinval.c
> > > @@ -190,9 +190,19 @@ static int queue_invalidate_wait(struct iommu
> > > *iommu, static int invalidate_sync(struct iommu *iommu) {
> > > struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
> > > + int rc;
> > >
> > > if ( qi_ctrl->qinval_maddr )
> > > - return queue_invalidate_wait(iommu, 0, 1, 1);
> > > + {
> > > + rc = queue_invalidate_wait(iommu, 0, 1, 1);
> > > + if ( rc )
> > > + {
> > > + if ( rc == -ETIMEDOUT )
> > > + panic("Queue invalidate wait descriptor was
> > timeout.\n");
> > > + return rc;
> > > + }
> > > + }
> > > +
> >
> > why do you still do panic here? w/ PATCH 1/3 you should return rc to
> > caller directly, right?
> >
>
> This panic is original in queue_invalidate_wait(). Patch 1/3 is just for
> Device-TLB
> flush error ( Actually it covers iotlb flush error).
> If I fix VT-d iotlb/context/iec flush error to propagated callers, then I can
> remove this panic and return rc to propagated caller directly.
>
>
>
> > > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
> > > + u16 seg, u8 bus, u8
> devfn)
> > {
> > > + struct domain *d;
> > > + struct pci_dev *pdev;
> > > +
> > > + d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> > > + ASSERT(d);
> > > + for_each_pdev(d, pdev)
> > > + {
> > > + if ( (pdev->seg == seg) &&
> > > + (pdev->bus == bus) &&
> > > + (pdev->devfn == devfn) )
> > > + {
> > > + if ( pdev->domain )
> > > + {
> > > + list_del(&pdev->domain_list);
> > > + pdev->domain = NULL;
> > > + }
> > > +
> > > + if ( pci_hide_device(bus, devfn) )
> > > + {
> > > + printk(XENLOG_ERR
> > > + "IOMMU hide device %04x:%02x:%02x error.",
> > > + seg, bus, devfn);
> > > + break;
> > > + }
> >
> > I'm not sure whether using pci_hide_device is enough or not break
> > other code path here? For example, now you have errors propagated to
> callers.
>
> More information, after call pci_hide_device(), ..
> pdev->domain = dom_xen;
> list_add(&pdev->domain_list, &dom_xen->arch.pdev_list); ..
>
> Hidden PCI devices are associated with 'dom_xen'.
> Now I found it may not clear rmrr mapping / context mapping. .i.e
> iommu_remove_device() -- [hd->platform_ops is NULL for dom_xen].
> But I think it is acceptable, IMO dom_xen is a part hypervisor, and there
> mappings would not a security issue.
> Or I can remove these mappings before to hide this device.
>
> So I think it will not break other code break other code.
>
> > What's the
> > final policy against flush error?
> >
>
> If Device-TLB flush is timeout, we'll hide the target ATS device and crash the
> domain owning this ATS device.
> If impacted domain is hardware domain, just throw out a warning, instead of
> crash the hardware domain.
> The hided Device will be disallowed to be further assigned to any domain.
>
>
>
> >If they will finally go to cleanup some more state relying on
> >pdev->domain, then that code path might be broken. If it's fine to do
> >cleanup at this point, then just hidden is not enough. If you look at
> >pci_remove_device, there's quite some state to be further cleaned up...
> >
>
>
> As the pdev->domain is 'dom_xen';
> So for this case, it is still working. Correct me if it is not correct.
> BTW, a quick question, which case to call pci_remove_device()?
>
>
> > I didn't think about the full logic thoroughly now. But it would
> > always be good to not hide device now until a point where all related
> > states have been cleaned up in error handling path chained up.
> >
Jan, could you help me to double check it? thanks.
Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |