[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device



>>> On 15.10.15 at 09:28, <yang.z.zhang@xxxxxxxxx> wrote:
> Jan Beulich wrote on 2015-10-15:
>>>>> On 15.10.15 at 03:03, <yang.z.zhang@xxxxxxxxx> wrote:
>>> Jan Beulich wrote on 2015-10-14:
>>>> As long as the multi-millisecond spins aren't going to go away by
>>>> other means, I think conversion to async mode is ultimately unavoidable.
>>> 
>>> I am not fully agreed. I think the time to spin is important. To me,
>>> less than 1 ms is acceptable and if the hardware can guarantee it,
>>> then sync mode also is ok.
>> 
>> Okay, let me put the condition slightly differently - any spin on the
>> order of what a WBINVD might take ought to be okay, provided both are
> 
> From the data we collected, the invalidation is completed within several us. 
> IMO, the time for WBINVD is varying due the size and different cache 
> hierarchies. And it may take more than several us in worst case.

Understood - hence the setting of the worst case latency of WBINVD
as an upper bound for other (kind of similar) software operation.

>> equally (in)accessible to guests. The whole discussion is really about
>> limiting the impact misbehaving guests can have on the whole system.
>> (Obviously any spin time reaching the order of a scheduling time slice
>> is a problem.)
> 
> The premise for a misbehaving guest to impact the system is that the IOMMU 
> is buggy which takes long time to complete the invalidation. In other words, 
> if all invalidations are able to complete within several us, what's the 
> matter to do with the spin time? 

The risk of exploits of such poorly behaving IOMMUs. I.e. if properly
operating IOMMUs only require several us, why spin for several ms?

>>> I remember the origin motivation to handle ATS problem is due to: 1.
>>> ATS spec allow 60s timeout to completed the flush which Xen only
>>> allows 1s, and 2. spin loop for 1s is not reasonable since it will
>>> hurt the scheduler. For the former, as we discussed before, either
>>> disable ATS support or only support some specific ATS
>>> devices(complete the flush less than 10ms or 1ms) is acceptable. For
>>> the latter, if spin loop for 1s is not acceptable, we can reduce the
>>> timeout to 10ms or 1ms
>> to eliminate the performance impaction.
>> 
>> If we really can, why has it been chosen to be 1s in the first place?
> 
> What I can tell is 1s is just the value the original author chooses. It has 
> no special means. I have double check with our hardware expert and he 
> suggests us to use the value as small as possible. According his comment, 
> 10ms is sufficiently large.  

So here you talk about milliseconds again, while above you talked
about microsecond. Can we at least settle on an order of what is
required? 10ms is 10 times the minimum time slice credit1 allows, i.e.
awfully long.

>>> Yes, I'd agree it would be best solution if Xen has the async mode.
>>> But spin loop is used widely in iommu code: not only for
>>> invalidations, lots of DMAR operations are using spin to sync
>>> hardware's status. For those operations, it is hard to use async mode.
>>> Or, even it is possible to use async mode, I don't see the benefit
>>> considering the cost and complexity which means we either need a
>>> timer or a
>> softirq to do the check.
>> 
>> Even if the cost is high, limited overall throughput by undue spinning
>> is worth it imo even outside of misbehaving guest considerations. I'm
>> surprised you're not getting similar pressure on this from the KVM
>> folks (assuming the use of spinning is similar there).
> 
> Because no one observe such invalidation timeout issue so far. What we have 
> discussed are only in theory. 

As is the case with many security related things. We shouldn't wait
until someone exploits them.

> btw, I have told the issue to Linux IOMMU maintainer but he didn't say 
> anything on it.

Interesting - that may speak for itself (depending on how long this
has been pending), but otoh is in line with experience I have with
many (but not all) Linux maintainers.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.