[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device



Jan Beulich wrote on 2015-10-14:
>>>> On 14.10.15 at 07:12, <yang.z.zhang@xxxxxxxxx> wrote:
>> Jan Beulich wrote on 2015-10-13:
>>>>>> On 13.10.15 at 07:27, <yang.z.zhang@xxxxxxxxx> wrote:
>>>> Jan Beulich wrote on 2015-10-12:
>>>>>>>> On 12.10.15 at 03:42, <yang.z.zhang@xxxxxxxxx> wrote:
>>>>>> So, my suggestion is that we can rely on user to not assign the
>>>>>> ATS device if hypervisor says it cannot support such device. For
>>>>>> example, if hypervisor find the invalidation isn't completed in
>>>>>> 1 second, then hypervisor can crash itself and tell the user
>>>>>> this ATS device needs more than 1 second invalidation time which
>>>>>> is not support by
>>>> Xen.
>>>>> 
>>>>> Crashing the hypervisor in such a case is a security issue, i.e.
>>>>> is not
>>>> 
>>>> Indeed. Crashing the guest is more reasonable.
>>>> 
>>>>> an acceptable thing (and the fact that we panic() on timeout expiry
>>>>> right now isn't really acceptable either). If crashing the offending
>>>>> guest was sufficient to contain the issue, that might be an option.
>>>>> Else
>>>> 
>>>> I think it should be sufficient (any concern from you?).
>>> 
>>> Having looked at the code, it wasn't immediately clear whether that
>>> would work. After all there one would think there would be a reason
>>> for the code panic()ing right now instead.
>> 
>> What the panic()ing refer to here?
> 
> E.g. what we have in queue_invalidate_wait():
> 
>         while ( poll_slot != QINVAL_STAT_DONE )
>         {
>             if ( NOW() > (start_time + DMAR_OPERATION_TIMEOUT) )
>             {
>                 print_qi_regs(iommu);
>                 panic("queue invalidate wait descriptor was not
> executed");
>             }
>             cpu_relax();
>         }
>>>> But if solution 1 is acceptable, I prefer it since most of ATS
>>>> devices are still able to play with Xen.
>>> 
>>> With a multi-millisecond spin, solution 1 would imo be acceptable
>>> only as a transitional measure.
>> 
>> What does the transitional measure mean? Do you mean we still need
>> the async flush for ATS issue or we can adapt solution 1 before ATS spec 
>> changed?
> 
> As long as the multi-millisecond spins aren't going to go away by
> other means, I think conversion to async mode is ultimately unavoidable.

I am not fully agreed. I think the time to spin is important. To me, less than 
1 ms is acceptable and if the hardware can guarantee it, then sync mode also is 
ok. 
I remember the origin motivation to handle ATS problem is due to: 1. ATS spec 
allow 60s timeout to completed the flush which Xen only allows 1s, and 2. spin 
loop for 1s is not reasonable since it will hurt the scheduler. For the former, 
as we discussed before, either disable ATS support or only support some 
specific ATS devices(complete the flush less than 10ms or 1ms) is acceptable. 
For the latter, if spin loop for 1s is not acceptable, we can reduce the 
timeout to 10ms or 1ms to eliminate the performance impaction. 
Yes, I'd agree it would be best solution if Xen has the async mode. But spin 
loop is used widely in iommu code: not only for invalidations, lots of DMAR 
operations are using spin to sync hardware's status. For those operations, it 
is hard to use async mode. Or, even it is possible to use async mode, I don't 
see the benefit considering the cost and complexity which means we either need 
a timer or a softirq to do the check.

Best regards,
Yang



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.