[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 0/2] VT-d flush issue




> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: Tuesday, December 22, 2015 4:01 PM
> To: Wu, Feng <feng.wu@xxxxxxxxx>; Xu, Quan <quan.xu@xxxxxxxxx>
> Cc: 'andrew.cooper3@xxxxxxxxxx' <andrew.cooper3@xxxxxxxxxx>;
> 'george.dunlap@xxxxxxxxxxxxx' <george.dunlap@xxxxxxxxxxxxx>; Nakajima, Jun
> <jun.nakajima@xxxxxxxxx>; Tian, Kevin <kevin.tian@xxxxxxxxx>; 'xen-
> devel@xxxxxxxxxxxxx' <xen-devel@xxxxxxxxxxxxx>; 'keir@xxxxxxx' <keir@xxxxxxx>;
> 'tim@xxxxxxx' <tim@xxxxxxx>
> Subject: RE: [Xen-devel] [PATCH v3 0/2] VT-d flush issue
> 
> >>> On 22.12.15 at 08:40, <feng.wu@xxxxxxxxx> wrote:
> > Maybe, there are still some misunderstanding about your expectation.
> > Let me summarize it here.
> >
> > After Quan's patch-set, there are two types of error code:
> > - -EOPNOTSUPP
> > Now we only support and use software way to synchronize the invalidation,
> > if someone calls queue_invalidate_wait() and passes sw with 0, then
> > -EOPNOTSUPP is returned (Though this cannot happen in real world, since
> > queue_invalidate_wait() is called only in one place and 1 is passed in to 
> > 'sw').
> > So I am not sure what should we do for this return value, if we really get
> > that return value, it means the flush is not actually executed, so the iommu
> > state is incorrect, the data is inconsistent. Do you think what should we do
> > for this case?
> 
> Since seeing this error would be a software bug, BUG() or ASSERT()
> are fine to handle this specific case, if need be.
> 
> > - -ETIMEDOUT
> > For this case, Quan has elaborate a lot, IIUIC, the main gap between you
> > and Quan is you think the error code should be propagated to the up caller,
> > while in Quan's implementation, he deals with this error in
> > invalidate_timeout()
> > and device_tlb_invalidate_timeout(), hence no need to propagated it to
> > up called, since the handling policy will crash the domain, so we don't 
> > think
> > propagated error code is needed. Even we propagate it, the up caller still
> > doesn't need to do anything for it.
> 
> "Handling" an error by e.g. domain_crash() doesn't mean you don't
> need to also modify (or at the very least inspect) callers: They may
> continue doing things _assuming_ success. Of course you don't
> need to domain_crash() at all layers. But errors from lower layers
> should, at least in most ordinary cases, lead to higher layers bailing
> instead of continuing.

So there are two questions:
1. Just confirmed with Quan, for IEC/context/iotlb flush timeout, the policy
will be Xen panic. Do you agree with this policy? If you do, we don't need to
pass the error code to the caller, right? If you don't, we may need more
discussion about how the handle this case first before anything else.
2. For Device iotlb flush, the current policy is to crash the domain, in this 
case
we need to check case by case whether the caller need to handle something
if timeout is encountered, right? If needed, we need do something there, if
not needed, we can just let it be.

Thanks,
Feng

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.