[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.



> On January 22, 2016 at 12:31am, <JBeulich@xxxxxxxx> wrote:
> >>> On 21.01.16 at 17:16, <quan.xu@xxxxxxxxx> wrote:
> >>  On January 20, 2016 at 7:29 pm,  <JBeulich@xxxxxxxx> wrote:
> >> >>> On 20.01.16 at 11:26, <quan.xu@xxxxxxxxx> wrote:
> >> >> On January 15, 2016 at 9:10, <JBeulich@xxxxxxxx> wrote:
> >> >> >>> On 23.12.15 at 09:25, <quan.xu@xxxxxxxxx> wrote:
> >> >> > @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu,
> >> >> >      return 0;
> >> >> >  }
> >> >> >
> >> >> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16
> did,
> >> >> > +                                         u16 seg, u8 bus, u8
> >> >> > +devfn)
> >> >> {
> >> >> > +    struct domain *d;
> >> >> > +    struct pci_dev *pdev;
> >> >> > +
> >> >> > +    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> >> >> > +    ASSERT(d);
> >> >>
> >> >> Don't you need to acquire some lock in order to safely assert this?
> >> >
> >> > Referred to the other use case of 'rcu_lock_domain_by_id()', Xen
> >> > checks whether The domain is 'NULL'.
> >> > Could I also replace the 'ASSERT(d)' with
> >> >           + If ( d == NULL )
> >> >           +   return;
> >> > ?
> >>
> >> At a first glance this doesn't look right, but in the end that's
> >> something
> > you need
> >> to answer.
> >>
> >
> > Is it quite similar to whether the domain has been destroyed when
> > Device-TLB is flushing?
> 
> Going back to the original question I raised, you need to ask
> yourself: Can d validly be NULL when you get here (e.g. due to some other
> processor changing something in parallel)?

I think that is YES.
Not all of the callers acquire RCU lock. Such as:
   $flush_iotlb_qi()--- iommu_flush_iotlb_psi() -- __intel_iommu_iotlb_flush() 
--intel_iommu_iotlb_flush() --iommu_iotlb_flush() 
--xenmem_add_to_physmap()--do_memory_op()
When Device-TLB is flushing for the above call path, the domain is destroyed in 
parallel. Then the d is NULL.


> If yes, you can't ASSERT(), and the
> next question then is what exactly it means that it got ripped off the table. 
> The
> answer to that determines whether simply returning would be okay.
> 

IMO, Simply returning would be okay, but the device would not be hidden.



> >> >> Also note that unused slots hold zero, i.e. there's at least a
> >> >> theoretical
> >> > risk of
> >> >> problems here when you don't also look at
> >> >> iommu->domid_bitmap.
> >> >>
> >> > I am not clear how to fix this point. Do you have good idea?
> >> > Add a lock to 'iommu->domid_bitmap'?
> >>
> >> How would a lock help avoiding mistaking an unused slot to mean Dom0?
> >> As already suggested - I think you simply need to consult the bitmap
> >> along with
> > the
> >> domain ID array.
> >>
> >
> > Try to get domain id with iommu->domid_map[did] ?
> 
> ???
> 
        +if ( iommu->domid_map )
        +   d = rcu_lock_domain_by_id(iommu->domid_map[did]);

Is it right?


> >> >> > +            {
> >> >> > +                list_del(&pdev->domain_list);
> >> >>
> >> >> This should happen under pcidevs_lock - you need to either acquire
> >> >> it or
> >> >> ASSERT() it being held.
> >> >>
> >> >
> >> > I should check whether it is under pcidevs_lock -- with
> >> > spin_is_locked(&pcidevs_lock)
> >> > If it is not under pcidevs_lock, I should acquire it.
> >> > I think ASSERT() is not a good idea. Hypervisor acquires this lock
> >> > and then remove the resource.
> >>
> >> I don't understand this last sentence.
> >>
> > For example: in
> > pci_remove_device()
> > {
> > ...
> >     spin_lock(&pcidevs_lock);
> >     ..
> >     iommu_remove_device()..
> >     ..
> >     spin_unlock(&pcidevs_lock);
> > }
> >
> > Device-TLB is maybe flush error in iommu_remove_device()..
> > Then it is under pcidevs_lock..
> > In this case, I need to check whether it is under pcidevs_lock.
> > If not, I need to acquire the pcidevs_lock.
> 
> Ah, okay. But you can't use spin_is_locked() for that purpose.
> 
Why?

If I introduce a new parameter 'lock'.
+ int lock = spin_is_locked(&pcidevs_lock);


+ if ( !lock )
+    spin_lock(&pcidevs_lock);
...
+ if ( !lock )
+    spin_unlock(&pcidevs_lock);

Is it right? 
Jan, do you have some better idea? I think ASSERT is not right.




> >> >> > @@ -349,9 +416,18 @@ static int flush_iotlb_qi(
> >> >> >          queue_invalidate_iotlb(iommu,
> >> >> >                                 type >>
> >> >> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
> >> >> >                                 dw, did, size_order, 0, addr);
> >> >> > +
> >> >> > +        /*
> >> >> > +         * Synchronize with hardware for invalidation request
> >> descriptors
> >> >> > +         * submitted before Device-TLB invalidate descriptor.
> >> >> > +         */
> >> >> > +        rc = invalidate_sync(iommu);
> >> >> > +        if ( rc )
> >> >> > +             return rc;
> >> >> > +
> >> >> >          if ( flush_dev_iotlb )
> >> >> >              ret = dev_invalidate_iotlb(iommu, did, addr,
> >> >> > size_order,
> >> type);
> >> >> > -        rc = invalidate_sync(iommu);
> >> >> > +
> >> >> >          if ( !ret )
> >> >> >              ret = rc;
> >> >> >      }
> >> >>
> >> >> This change is because of ...?
> >> >>
> >> > As Kevin's comments,
> >> > I add 'dev_invalidate_iotlb_sync()' to dev_invalidate_iotlb().
> >> > Then i can know which device is flush timeout.
> >>
> >> I don't understand: How is your reply related to you moving up the
> > invocation of
> >> invalidate_sync()?
> >
> > This invalidate_sync() is to synchronize with hardware for
> > invalidation request descriptors submitted before Device-TLB
> > invalidate descriptor.
> >
> > If I don't add this invalidate_sync()..
> > dev_invalidate_iotlb_timeout() is not clear whether the flush time out
> > is about Device-TLB flush error or the other previous
> > iotlb/iec/context flush error.
> 
> This being anything but obvious, maybe a brief comment would help?
> 

Change 
"
Synchronize with hardware for invalidation request descriptors 
submitted before Device-TLB invalidate descriptor.
"
TO

"Synchronize invalidation completions before Device-TLB invalidation
"
?

> >> >> Plus logging the error code and device coordinates might turn out
> >> >> helpful in such cases. But first of all - if there was a timeout,
> >> >> we'd make it here only for Dom0. Perhaps the printing should move
> >> >> into an
> >> else to the domain_crash()?
> >> >> And if there was another error, the message would be outright wrong.
> >> >>
> >> > IMO, the print for Dom0 is enough.
> >> > I think it is not a good idea to move into an else to domain_crash().
> >> > Domain_crash is quite a common part.
> >> > Anyway I can improve it in low priority.
> >>
> >> At the very least the message should not end up being actively misleading.
> >>
> > What about the below?
> > +                printk(XENLOG_ERR
> > +                       "Device %04x:%02x:%02x is flush error.",
> > +                       seg, bus, devfn);
> 
> Well, for one you once again omit the error code. And then (I think others had
> pointed that out already) a device cannot be flush error.
> Perhaps you mean "Flush error %d on device ...\n"? Plus you absolutely should
> print PCI device coordinates in the canonical way, 

What does 'print PCI device coordinates in the canonical way' mean ?
Such as  "0000:00:02.1"?

> so that grep-ing a log file for
> issues with a specific device will find everything related to that device.


I am appreciate your kindness.
Quan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.