[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] VT-d flush timeout

> >I don't get your point. What's the different between the 1s and the
> >large enough value? Since hardware completed the flush quickly, it will
> >never spin more than real flush time in normal case. 1s spin only
> >happens in the abnormal case.
> Right, but we can't ignore this abnormal case. In particular we can't exclude
> that a DomU with a device assigned may have ways to (perhaps indirectly)
> affect the completion time of the flushes.
> >My only concern is that, for QI flush, the spin time relies on the
> >length of the queue. I am not sure whether 1s is enough for worst case
> >and I think we should remove the 1s in QI flush. And I think this also
> >the same reason for Linux don't use timeout mechanism in QI flush.
> First of all I think both Linux and Xen in the majority of cases waits for
> completion of just individual queue entries. I.e. I'm not sure if the 
> practical
> worst case really is equal to the theoretical one. And then removing a
> timeout just to allow _longer_ spinning isn't really a step forward. If the
> timeout isn't big enough, the only solution is to immediately replace it with
> asynchronous handling.

Giving this path of long time wait-loop only happens at the case when the 
hardware fails, I don't care if it enters panic in 1 seconds or in 10ms 
seconds. But the software compatibility (for all existing platforms and 
potential future platforms) is much more important.

Replacing wait-loop with an asynchronous handling mechanism is definitely good 
-- maybe put an TODO list for the IOMMU stuff which I believe requiring not 
small effort. But the main goal should be targeting the normal code path, i.e. 
success of the IOTLB operation no matter it is 1ms or 10ms. 

However, as for the timeout code path, given that the specification doesn't say 
what the hardware WORST case is, using practical smaller number is not a good 
choice to me. Nobody is able to test on all platforms, and it may fail in 
future platform. Further more, the result of the mis-prediction to the hardware 
behavior is so serious-->leading to hypervisor panic.  Of course 1second is not 
a good value too, but at least it is verified in the past years.

Thx Eddie

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.