[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Patch v11 3/3] vt-d: fix vt-d Device-TLB flush timeout issue
On June 17, 2016 3:01 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote: > >>> On 17.06.16 at 08:08, <quan.xu@xxxxxxxxx> wrote: > > > On June 16, 2016 5:05 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote: > >> >>> On 16.06.16 at 10:42, <quan.xu@xxxxxxxxx> wrote: > >> > On June 02, 2016 7:07 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote: > >> >> >>> On 01.06.16 at 11:05, <quan.xu@xxxxxxxxx> wrote: > >> >> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 > did, > >> >> > + struct pci_ats_dev > >> >> > +*ats_dev) { > >> >> > + struct domain *d = NULL; > >> >> > + struct pci_dev *pdev; > >> >> > + > >> >> > + if ( test_bit(did, iommu->domid_bitmap) ) > >> >> > + d = rcu_lock_domain_by_id(iommu->domid_map[did]); > >> >> > + > >> >> > + /* > >> >> > + * In case the domain has been freed or the IOMMU domid bitmap > is > >> >> > + * not valid, the device no longer belongs to this domain. > >> >> > + */ > >> >> > + if ( d == NULL ) > >> >> > + return; > >> >> > + > >> >> > + pcidevs_lock(); > >> >> > + > >> >> > + for_each_pdev(d, pdev) > >> >> > + { > >> >> > + if ( (pdev->seg == ats_dev->seg) && > >> >> > + (pdev->bus == ats_dev->bus) && > >> >> > + (pdev->devfn == ats_dev->devfn) ) > >> >> > + { > >> >> > + ASSERT(pdev->domain); > >> >> > + list_del(&pdev->domain_list); > >> >> > + pdev->domain = NULL; > >> >> > + pci_hide_existing_device(pdev); > >> >> > + break; > >> >> > + } > >> >> > + } > >> >> > + > >> >> > + pcidevs_unlock(); > >> >> > >> >> ... this loop (and locking). (Of course such a change may better > >> >> be done in another preparatory patch.) > >> >> > >> > > >> > To eliminate the locking? I am afraid the locking is still a must > >> > here even without the loop, also referring to device_assigned().. > >> > >> If the entire loop gets eliminated, what would be left is > >> > >> pcidevs_lock(); > >> pcidevs_unlock(); > >> > >> which I don't think does any good. > > > > Why? I can't follow it.. > > I don't understand your question. Can you explain what use above code > sequence is, in your opinion? Or else - what does the "why" > refer to? > Ah, there may be a gap between us. without this loop, these pdev operation should be still there, such as, + ASSERT(pdev->domain); + list_del(&pdev->domain_list); + pdev->domain = NULL; + pci_hide_existing_device(pdev); So, the left is not only: pcidevs_lock(); pcidevs_unlock(); > >> >> > +static int __must_check dev_invalidate_sync(struct iommu > >> >> > +*iommu, > >> >> > +u16 > >> >> did, > >> >> > + struct pci_ats_dev > >> >> > +*ats_dev) { > >> >> > + struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu); > >> >> > + int rc = 0; > >> >> > + > >> >> > + if ( qi_ctrl->qinval_maddr ) > >> >> > + { > >> >> > + rc = queue_invalidate_wait(iommu, 0, 1, 1, 1); > >> >> > + > >> >> > + if ( rc == -ETIMEDOUT ) > >> >> > + dev_invalidate_iotlb_timeout(iommu, did, ats_dev); > >> >> > + } > >> >> > + > >> >> > + return rc; > >> >> > +} > >> >> > >> >> I've never really understood why invalidate_sync() returns success > >> >> when it didn't do anything. Now that you copy this same behavior > >> >> here, I really need to ask you to explain that. > >> >> > >> > > >> > It is acceptable to me, returning success when it didn't do > >> > anything > >> > -- this is worth reflection and criticism:(.. > >> > It is better: > >> > + if ( qi_ctrl->qinval_maddr ) > >> > + ... > >> > + else > >> > + rc = -ENOENT; > >> > >> Right. And perhaps a separate patch to make invalidate_sync() do the > same. > > > > Agreed. > > > >> Question is whether this really ought to be a conditional, or whether > > instead > >> this code is unreachable when qinval is not in use, in which case > >> these conditionals would imo better be converted to ASSERT()s. > >> > > > > IMO, this should be a conditional. > > As mentioned below, strictly speaking, this is a bug. We can't > > ASSERT() based on a bug.. > > A coming patch may fix it.. > > And again I don't understand: ASSERT()s are to verify assumed state. If static > code analysis resulted in understanding a function is unreachable when > qi_ctrl->qinval_maddr is zero (because qinval ought to have got disabled if > any > of the table setup failed), then adding ASSERT() would (a) document that and > (b) allow to know quickly if something broke that assumption. > You are right. A separate patch does this. > But then again I may simply misunderstand your wording: "We can't ASSERT() > based on a bug" is really pretty unclear to me. > I supposed the variable in ASSERT() is always true, but disable_qinval() needs to make qi_ctrl->qinval_maddr zero, but today it doesn't do this -- a bug. With your explanation, I got it now. Thanks. Quan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |