Xen project Mailing List

Re: [Xen-devel] VT-d flush timeout

>>> On 25.09.14 at 03:02, <eddie.dong@xxxxxxxxx> wrote: >> >I don't get your point. What's the different between the 1s and the >> >large enough value? Since hardware completed the flush quickly, it will >> >never spin more than real flush time in normal case. 1s spin only >> >happens in the abnormal case. >> >> Right, but we can't ignore this abnormal case. In particular we can't > exclude >> that a DomU with a device assigned may have ways to (perhaps indirectly) >> affect the completion time of the flushes. >> >> >My only concern is that, for QI flush, the spin time relies on the >> >length of the queue. I am not sure whether 1s is enough for worst case >> >and I think we should remove the 1s in QI flush. And I think this also >> >the same reason for Linux don't use timeout mechanism in QI flush. >> >> First of all I think both Linux and Xen in the majority of cases waits for >> completion of just individual queue entries. I.e. I'm not sure if the > practical >> worst case really is equal to the theoretical one. And then removing a >> timeout just to allow _longer_ spinning isn't really a step forward. If the >> timeout isn't big enough, the only solution is to immediately replace it > with >> asynchronous handling. >> > > Giving this path of long time wait-loop only happens at the case when the > hardware fails, I don't care if it enters panic in 1 seconds or in 10ms > seconds. But the software compatibility (for all existing platforms and > potential future platforms) is much more important. I agree for the paths leading to a panic(). But there's one such case where it doesn't: snb_vtd_ops_preamble(). > Replacing wait-loop with an asynchronous handling mechanism is definitely > good -- maybe put an TODO list for the IOMMU stuff which I believe requiring > not small effort. But the main goal should be targeting the normal code path, > i.e. success of the IOTLB operation no matter it is 1ms or 10ms. > > However, as for the timeout code path, given that the specification doesn't > say what the hardware WORST case is, using practical smaller number is not a > good choice to me. Nobody is able to test on all platforms, and it may fail > in future platform. Further more, the result of the mis-prediction to the > hardware behavior is so serious-->leading to hypervisor panic. Of course > 1second is not a good value too, but at least it is verified in the past > years. Once again - the ATS spec talks of 60 seconds (with possibly another 30 seconds added on top depending on how you read it). So while for the non-ATS case, provided the one case pointed out above gets dealt with, I agree we don't need to bother changing the code, the ATS case clearly makes asynchronous handling necessary. It is for that reason that we decided to disable ATS support by default. Albeit now that I think about it again, I'm not sure that was an appropriate action for the AMD side of things - Andrew, what do you think? Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.