[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 0/2] VT-d flush issue



>On 22.12.2015 at 4:01pm <JBeulich@xxxxxxxx> wrote:
> >>> On 22.12.15 at 08:40, <feng.wu@xxxxxxxxx> wrote:
> > Maybe, there are still some misunderstanding about your expectation.
> > Let me summarize it here.
> >
> > After Quan's patch-set, there are two types of error code:
> > - -EOPNOTSUPP
> > Now we only support and use software way to synchronize the
> > invalidation, if someone calls queue_invalidate_wait() and passes sw
> > with 0, then -EOPNOTSUPP is returned (Though this cannot happen in
> > real world, since
> > queue_invalidate_wait() is called only in one place and 1 is passed in to 
> > 'sw').
> > So I am not sure what should we do for this return value, if we really
> > get that return value, it means the flush is not actually executed, so
> > the iommu state is incorrect, the data is inconsistent. Do you think
> > what should we do for this case?
> 
> Since seeing this error would be a software bug, BUG() or ASSERT() are fine to
> handle this specific case, if need be.
> 
> > - -ETIMEDOUT
> > For this case, Quan has elaborate a lot, IIUIC, the main gap between
> > you and Quan is you think the error code should be propagated to the
> > up caller, while in Quan's implementation, he deals with this error in
> > invalidate_timeout()
> > and device_tlb_invalidate_timeout(), hence no need to propagated it to
> > up called, since the handling policy will crash the domain, so we
> > don't think propagated error code is needed. Even we propagate it, the
> > up caller still doesn't need to do anything for it.
> 
> "Handling" an error by e.g. domain_crash() doesn't mean you don't need to also
> modify (or at the very least inspect) callers: They may continue doing things
> _assuming_ success. Of course you don't need to domain_crash() at all layers.
> But errors from lower layers should, at least in most ordinary cases, lead to
> higher layers bailing instead of continuing.
> 

For Device-TLB flush error, I think we need to propagated error code.
For IEC/iotlb/context flush error, if panic is acceptable, we can ignore the 
propagated error code. BTW, it is very challenge / tricky to handle all
Of error, and some error is unrecoverable. As mentioned, it looks like 
rewriting Xen hypervisor.

For -EOPNOTSUPP, we can print warning message. If it supports interrupt method, 
we can return 0 in queue_invalidate_wait().

Feng, thanks for your update!

-Quan











_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.