[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 0/2] VT-d flush issue

>>> On 22.12.15 at 09:10, <feng.wu@xxxxxxxxx> wrote:

>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
>> Sent: Tuesday, December 22, 2015 4:01 PM
>> To: Wu, Feng <feng.wu@xxxxxxxxx>; Xu, Quan <quan.xu@xxxxxxxxx>
>> Cc: 'andrew.cooper3@xxxxxxxxxx' <andrew.cooper3@xxxxxxxxxx>;
>> 'george.dunlap@xxxxxxxxxxxxx' <george.dunlap@xxxxxxxxxxxxx>; Nakajima, Jun
>> <jun.nakajima@xxxxxxxxx>; Tian, Kevin <kevin.tian@xxxxxxxxx>; 'xen-
>> devel@xxxxxxxxxxxxx' <xen-devel@xxxxxxxxxxxxx>; 'keir@xxxxxxx' 
>> <keir@xxxxxxx>;
>> 'tim@xxxxxxx' <tim@xxxxxxx>
>> Subject: RE: [Xen-devel] [PATCH v3 0/2] VT-d flush issue
>> >>> On 22.12.15 at 08:40, <feng.wu@xxxxxxxxx> wrote:
>> > Maybe, there are still some misunderstanding about your expectation.
>> > Let me summarize it here.
>> >
>> > After Quan's patch-set, there are two types of error code:
>> > Now we only support and use software way to synchronize the invalidation,
>> > if someone calls queue_invalidate_wait() and passes sw with 0, then
>> > -EOPNOTSUPP is returned (Though this cannot happen in real world, since
>> > queue_invalidate_wait() is called only in one place and 1 is passed in to 
> 'sw').
>> > So I am not sure what should we do for this return value, if we really get
>> > that return value, it means the flush is not actually executed, so the 
> iommu
>> > state is incorrect, the data is inconsistent. Do you think what should we 
> do
>> > for this case?
>> Since seeing this error would be a software bug, BUG() or ASSERT()
>> are fine to handle this specific case, if need be.
>> > For this case, Quan has elaborate a lot, IIUIC, the main gap between you
>> > and Quan is you think the error code should be propagated to the up caller,
>> > while in Quan's implementation, he deals with this error in
>> > invalidate_timeout()
>> > and device_tlb_invalidate_timeout(), hence no need to propagated it to
>> > up called, since the handling policy will crash the domain, so we don't 
> think
>> > propagated error code is needed. Even we propagate it, the up caller still
>> > doesn't need to do anything for it.
>> "Handling" an error by e.g. domain_crash() doesn't mean you don't
>> need to also modify (or at the very least inspect) callers: They may
>> continue doing things _assuming_ success. Of course you don't
>> need to domain_crash() at all layers. But errors from lower layers
>> should, at least in most ordinary cases, lead to higher layers bailing
>> instead of continuing.
> So there are two questions:
> 1. Just confirmed with Quan, for IEC/context/iotlb flush timeout, the policy
> will be Xen panic. Do you agree with this policy? If you do, we don't need 
> to
> pass the error code to the caller, right? If you don't, we may need more
> discussion about how the handle this case first before anything else.

Just to repeat several earlier statements: BUG() and panic() ought
to be avoided wherever possible (i.e. including here).

> 2. For Device iotlb flush, the current policy is to crash the domain, in 
> this case
> we need to check case by case whether the caller need to handle something
> if timeout is encountered, right? If needed, we need do something there, if
> not needed, we can just let it be.

"Just let it be" is wrong (and is what led to the bad state the code is
in right now in this regard). I gave advice for the direction before:
Add __must_check. This will make the compiler point out to you
where changes are needed.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.