[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device
Xu, Quan wrote on 2015-09-16: > Introduction > ============ > > VT-d code currently has a number of cases where completion of > certain operations is being waited for by way of spinning. The > majority of instances use that variable indirectly through > IOMMU_WAIT_OP() macro , allowing for loops of up to 1 second > (DMAR_OPERATION_TIMEOUT). While in many of the cases this may be acceptable, > the invalidation case seems particularly problematic. > > Currently hypervisor polls the status address of wait descriptor up to > 1 second to get Invalidation flush result. When Invalidation queue > includes Device-TLB invalidation, using 1 second is a mistake here in > the validation sync. As the 1 second timeout here is related to > response times by the IOMMU engine, Instead of Device-TLB invalidation > with PCI-e Address Translation Services (ATS) in use. the ATS specification > mandates a timeout of 1 _minute_ for cache flush. > The ATS case needs to be taken into consideration when doing invalidations. > Obviously we can't spin for a minute, so invalidation absolutely needs > to be converted to a non-spinning model. > > Also i should fix the new memory security issue. > The page freed from the domain should be on held, until the Device-TLB > flush is completed (ATS timeout of 1 _minute_). > The page previously associated with the freed portion of GPA should > not be reallocated for another purpose until the appropriate > invalidations have been performed. Otherwise, the original page owner > can still access freed page though DMA. > Hi Maintainers, According the discussion and suggestion you made in past several weeks, obviously, it is not an easy task. So I am wondering whether it is worth to do it since: 1. ATS device is not popular. I only know one NIC from Myricom has ATS capabilities. 2. The issue is only in theory. Linux, Windows, VMware are all using spin now as well as Xen, but none of them observed any problem so far. 3. I know there is propose to modify the timeout value(maybe less in 1 ms) in ATS spec to mitigate the problem. But the risk is how long to achieve it. 4. The most important concern is it is too complicated to fix it in Xen since it needs to modify the core memory part. And I don't think Quan and i have the enough knowledge to do it perfectly currently. It may take long time, half of year or one year?(We have spent three months so far). Yes, if Tim likes to take it. It will be much fast. :) So, my suggestion is that we can rely on user to not assign the ATS device if hypervisor says it cannot support such device. For example, if hypervisor find the invalidation isn't completed in 1 second, then hypervisor can crash itself and tell the user this ATS device needs more than 1 second invalidation time which is not support by Xen. Any comments? Best regards, Yang _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |