[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.



>>> On 22.01.16 at 16:57, <quan.xu@xxxxxxxxx> wrote:
>>  On January 22, 2016 at 12:31am, <JBeulich@xxxxxxxx> wrote:
>> >>> On 21.01.16 at 17:16, <quan.xu@xxxxxxxxx> wrote:
>> >>  On January 20, 2016 at 7:29 pm,  <JBeulich@xxxxxxxx> wrote:
>> >> >>> On 20.01.16 at 11:26, <quan.xu@xxxxxxxxx> wrote:
>> >> >> On January 15, 2016 at 9:10, <JBeulich@xxxxxxxx> wrote:
>> >> >> >>> On 23.12.15 at 09:25, <quan.xu@xxxxxxxxx> wrote:
>> >> >> Also note that unused slots hold zero, i.e. there's at least a
>> >> >> theoretical
>> >> > risk of
>> >> >> problems here when you don't also look at
>> >> >> iommu->domid_bitmap.
>> >> >>
>> >> > I am not clear how to fix this point. Do you have good idea?
>> >> > Add a lock to 'iommu->domid_bitmap'?
>> >>
>> >> How would a lock help avoiding mistaking an unused slot to mean Dom0?
>> >> As already suggested - I think you simply need to consult the bitmap
>> >> along with
>> > the
>> >> domain ID array.
>> >>
>> >
>> > Try to get domain id with iommu->domid_map[did] ?
>> 
>> ???
>> 
>         +if ( iommu->domid_map )
>         +   d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> 
> Is it right?

I don't see what this changes. Again - what your code has been
lacking so far is some mechanism to guarantee that what you
read from domid_map[] is actually a valid domain ID. I can only
once again point your attention to domid_bitmap, which afaics
gives you exactly that valid/invalid indication.

>> >> >> > +            {
>> >> >> > +                list_del(&pdev->domain_list);
>> >> >>
>> >> >> This should happen under pcidevs_lock - you need to either acquire
>> >> >> it or
>> >> >> ASSERT() it being held.
>> >> >>
>> >> >
>> >> > I should check whether it is under pcidevs_lock -- with
>> >> > spin_is_locked(&pcidevs_lock)
>> >> > If it is not under pcidevs_lock, I should acquire it.
>> >> > I think ASSERT() is not a good idea. Hypervisor acquires this lock
>> >> > and then remove the resource.
>> >>
>> >> I don't understand this last sentence.
>> >>
>> > For example: in
>> > pci_remove_device()
>> > {
>> > ...
>> >     spin_lock(&pcidevs_lock);
>> >     ..
>> >     iommu_remove_device()..
>> >     ..
>> >     spin_unlock(&pcidevs_lock);
>> > }
>> >
>> > Device-TLB is maybe flush error in iommu_remove_device()..
>> > Then it is under pcidevs_lock..
>> > In this case, I need to check whether it is under pcidevs_lock.
>> > If not, I need to acquire the pcidevs_lock.
>> 
>> Ah, okay. But you can't use spin_is_locked() for that purpose.
>> 
> Why?

Because spin_is_locked() returns true if _any CPU_ holds the lock,
not just when the CPU you're running on does.

> If I introduce a new parameter 'lock'.
> + int lock = spin_is_locked(&pcidevs_lock);
> 
> 
> + if ( !lock )
> +    spin_lock(&pcidevs_lock);
> ...
> + if ( !lock )
> +    spin_unlock(&pcidevs_lock);
> 
> Is it right? 
> Jan, do you have some better idea?

If indeed different locking state is possible for different call trees,
then I'm afraid you won't get around passing down flags to indicate
whether the needed lock is already being held.

> I think ASSERT is not right.

Indeed.

>> >> >> > @@ -349,9 +416,18 @@ static int flush_iotlb_qi(
>> >> >> >          queue_invalidate_iotlb(iommu,
>> >> >> >                                 type >>
>> >> >> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
>> >> >> >                                 dw, did, size_order, 0, addr);
>> >> >> > +
>> >> >> > +        /*
>> >> >> > +         * Synchronize with hardware for invalidation request
>> >> descriptors
>> >> >> > +         * submitted before Device-TLB invalidate descriptor.
>> >> >> > +         */
>> >> >> > +        rc = invalidate_sync(iommu);
>> >> >> > +        if ( rc )
>> >> >> > +             return rc;
>> >> >> > +
>> >> >> >          if ( flush_dev_iotlb )
>> >> >> >              ret = dev_invalidate_iotlb(iommu, did, addr,
>> >> >> > size_order,
>> >> type);
>> >> >> > -        rc = invalidate_sync(iommu);
>> >> >> > +
>> >> >> >          if ( !ret )
>> >> >> >              ret = rc;
>> >> >> >      }
>> >> >>
>> >> >> This change is because of ...?
>> >> >>
>> >> > As Kevin's comments,
>> >> > I add 'dev_invalidate_iotlb_sync()' to dev_invalidate_iotlb().
>> >> > Then i can know which device is flush timeout.
>> >>
>> >> I don't understand: How is your reply related to you moving up the
>> > invocation of
>> >> invalidate_sync()?
>> >
>> > This invalidate_sync() is to synchronize with hardware for
>> > invalidation request descriptors submitted before Device-TLB
>> > invalidate descriptor.
>> >
>> > If I don't add this invalidate_sync()..
>> > dev_invalidate_iotlb_timeout() is not clear whether the flush time out
>> > is about Device-TLB flush error or the other previous
>> > iotlb/iec/context flush error.
>> 
>> This being anything but obvious, maybe a brief comment would help?
>> 
> 
> Change 
> "
> Synchronize with hardware for invalidation request descriptors 
> submitted before Device-TLB invalidate descriptor.
> "
> TO
> 
> "Synchronize invalidation completions before Device-TLB invalidation
> "
> ?

I'd even further emphasize the ordering, by saying "Before
Device-TLB invalidation we need to ..." (or perhaps it's rather
"... we'd like to ...").

>> >> >> Plus logging the error code and device coordinates might turn out
>> >> >> helpful in such cases. But first of all - if there was a timeout,
>> >> >> we'd make it here only for Dom0. Perhaps the printing should move
>> >> >> into an
>> >> else to the domain_crash()?
>> >> >> And if there was another error, the message would be outright wrong.
>> >> >>
>> >> > IMO, the print for Dom0 is enough.
>> >> > I think it is not a good idea to move into an else to domain_crash().
>> >> > Domain_crash is quite a common part.
>> >> > Anyway I can improve it in low priority.
>> >>
>> >> At the very least the message should not end up being actively misleading.
>> >>
>> > What about the below?
>> > +                printk(XENLOG_ERR
>> > +                       "Device %04x:%02x:%02x is flush error.",
>> > +                       seg, bus, devfn);
>> 
>> Well, for one you once again omit the error code. And then (I think others 
> had
>> pointed that out already) a device cannot be flush error.
>> Perhaps you mean "Flush error %d on device ...\n"? Plus you absolutely 
> should
>> print PCI device coordinates in the canonical way, 
> 
> What does 'print PCI device coordinates in the canonical way' mean ?
> Such as  "0000:00:02.1"?

Yes. There are numerous examples throughout the tree.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.