[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device



>>> On 15.10.15 at 10:52, <yang.z.zhang@xxxxxxxxx> wrote:
> Jan Beulich wrote on 2015-10-15:
>>>>> On 15.10.15 at 09:28, <yang.z.zhang@xxxxxxxxx> wrote:
>>> The premise for a misbehaving guest to impact the system is that the
>>> IOMMU is buggy which takes long time to complete the invalidation.
>>> In other words, if all invalidations are able to complete within
>>> several us, what's the matter to do with the spin time?
>> 
>> The risk of exploits of such poorly behaving IOMMUs. I.e. if properly
> 
> But this is not a software flaw. A guest has no way to know the underlying 
> IOMMU is wrong and it cannot exploit it.

A guest doesn't need to know what IOMMU is there in order to try
some exploit. Plus - based other information it may be able to make
an educated guess.

>> operating IOMMUs only require several us, why spin for several ms?
> 
> 10ms is just my suggestion. I don't know whether future hardware will need 
> more time to complete the invalidation. So I think we need to have a large 
> enough timeout here. Meanwhile, doesn't impact the scheduling.

It does, as explained further down in my previous reply.

>>>>> I remember the origin motivation to handle ATS problem is due to: 1.
>>>>> ATS spec allow 60s timeout to completed the flush which Xen only
>>>>> allows 1s, and 2. spin loop for 1s is not reasonable since it will
>>>>> hurt the scheduler. For the former, as we discussed before, either
>>>>> disable ATS support or only support some specific ATS
>>>>> devices(complete the flush less than 10ms or 1ms) is acceptable.
>>>>> For the latter, if spin loop for 1s is not acceptable, we can
>>>>> reduce the timeout to 10ms or 1ms
>>>> to eliminate the performance impaction.
>>>> 
>>>> If we really can, why has it been chosen to be 1s in the first place?
>>> 
>>> What I can tell is 1s is just the value the original author chooses.
>>> It has no special means. I have double check with our hardware
>>> expert and he suggests us to use the value as small as possible.
>>> According his comment, 10ms is sufficiently large.
>> 
>> So here you talk about milliseconds again, while above you talked
>> about microsecond. Can we at least settle on an order of what is
>> required? 10ms is
>> 10 times the minimum time slice credit1 allows, i.e.
>> awfully long.
> 
> We can use an appropriate value which you think reasonable which can cover 
> most of invalidation cases. For left cases, the vcpu can yield the CPU to 
> others until a timer fired. In callback function, hypervisor can check 
> whether the invalidation is completed. If yes, schedule in the vcpu. 
> Otherwise, kill the guest due to unpredictable invalidation timeout.

Using a timer implies you again think about pausing the vCPU until
the invalidation completes. Which, as discussed before, has its own
problems and, even worse, won't cover the domain's other vCPU-s
or devices still possibly doing work involving the use of the being
invalidated entries. Or did you have something else in mind?

IOW - as soon as spinning time reaches the order of the scheduler
time slice, I think the only sane model is async operation with
proper refcounting.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.