[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] IOMMU: don't disable bus mastering on faults for devices used by Xen or Dom0



>>> On 04.12.12 at 01:55, "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx> wrote:
>>  
>> >>> On 03.12.12 at 07:08, "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx>
>> wrote:
>> > If the phantom device support for IOMMU is in upstream,  is this patch
>> > still needed ?
>> 
>> Phantom function is unrelated to the behavioral adjustment here.
>> 
>> >  Basically,  I can't figure out why several faults should be allowed
>> > before disabling bus mastering.   Did you meet some real issues ?   Thanks!
>> 
>> I observed quite a different driver failure pattern with and without this
>> adjustment, but in a contrived environment only. From the customer data
>> for the problem that prompted the phantom function work, I could also
>> conclude the same (comparing the driver failure under native Linux with
>> IOMMU turned on and the one under Xen).
>> 
>> But in any case, I am of the opinion that an occasional fault shouldn't give
>> reason to disable the device altogether - what we're aiming at is solely to
>> keep Xen and other domains functional (which doesn't require as drastic an
>> action as was carried out prior to this adjustment). Also, afaict native 
> Linux
>> doesn't have any such disabling behavior at all.
> 
> Okay, maybe we need to align this with native linux side, and just keep the 
> fault reporting instead of disabling it in device level. And if the number of 
> faults reaches to  a limit, and hypervisor can choose to suppress its output. 

Just suppressing the output is not enough (and I'm sure you're
aware that, other than native Linux and for whatever obscure
reason, at least the VT-d code doesn't produce _any_ indication
of a fault in the hypervisor log) - the problem that was addressed
with the original change was that with high enough a fault rate,
a CPU could be made busy with just handling these faults.

Having moved the real handling of the fault into a tasklet only
reduced the impact, so I continue to think that shutting off
at least bus mastering on the device is the right thing to do.
But - this shutting off is based on the source ID of the request,
so if the source ID doesn't refer to the device at fault (as e.g.
would currently be the case for the Marvell controllers that
we're adding the phantom function support for), we would
continue to be in trouble. But buggy hardware can certainly be
expected not to be passed through to guests in the first place.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.