[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device



Jan Beulich wrote on 2015-10-15:
>>>> On 15.10.15 at 09:28, <yang.z.zhang@xxxxxxxxx> wrote:
>> Jan Beulich wrote on 2015-10-15:
>>>>>> On 15.10.15 at 03:03, <yang.z.zhang@xxxxxxxxx> wrote:
>>>> Jan Beulich wrote on 2015-10-14:
>>>>> As long as the multi-millisecond spins aren't going to go away by
>>>>> other means, I think conversion to async mode is ultimately unavoidable.
>>>> 
>>>> I am not fully agreed. I think the time to spin is important. To
>>>> me, less than 1 ms is acceptable and if the hardware can guarantee
>>>> it, then sync mode also is ok.
>>> 
>>> Okay, let me put the condition slightly differently - any spin on
>>> the order of what a WBINVD might take ought to be okay, provided
>>> both are
>> 
>> From the data we collected, the invalidation is completed within several us.
>> IMO, the time for WBINVD is varying due the size and different cache
>> hierarchies. And it may take more than several us in worst case.
> 
> Understood - hence the setting of the worst case latency of WBINVD as
> an upper bound for other (kind of similar) software operation.
> 
>>> equally (in)accessible to guests. The whole discussion is really about
>>> limiting the impact misbehaving guests can have on the whole system.
>>> (Obviously any spin time reaching the order of a scheduling time slice
>>> is a problem.)
>> 
>> The premise for a misbehaving guest to impact the system is that the
>> IOMMU is buggy which takes long time to complete the invalidation.
>> In other words, if all invalidations are able to complete within
>> several us, what's the matter to do with the spin time?
> 
> The risk of exploits of such poorly behaving IOMMUs. I.e. if properly

But this is not a software flaw. A guest has no way to know the underlying 
IOMMU is wrong and it cannot exploit it.

> operating IOMMUs only require several us, why spin for several ms?

10ms is just my suggestion. I don't know whether future hardware will need more 
time to complete the invalidation. So I think we need to have a large enough 
timeout here. Meanwhile, doesn't impact the scheduling.

> 
>>>> I remember the origin motivation to handle ATS problem is due to: 1.
>>>> ATS spec allow 60s timeout to completed the flush which Xen only
>>>> allows 1s, and 2. spin loop for 1s is not reasonable since it will
>>>> hurt the scheduler. For the former, as we discussed before, either
>>>> disable ATS support or only support some specific ATS
>>>> devices(complete the flush less than 10ms or 1ms) is acceptable.
>>>> For the latter, if spin loop for 1s is not acceptable, we can
>>>> reduce the timeout to 10ms or 1ms
>>> to eliminate the performance impaction.
>>> 
>>> If we really can, why has it been chosen to be 1s in the first place?
>> 
>> What I can tell is 1s is just the value the original author chooses.
>> It has no special means. I have double check with our hardware
>> expert and he suggests us to use the value as small as possible.
>> According his comment, 10ms is sufficiently large.
> 
> So here you talk about milliseconds again, while above you talked
> about microsecond. Can we at least settle on an order of what is
> required? 10ms is
> 10 times the minimum time slice credit1 allows, i.e.
> awfully long.

We can use an appropriate value which you think reasonable which can cover most 
of invalidation cases. For left cases, the vcpu can yield the CPU to others 
until a timer fired. In callback function, hypervisor can check whether the 
invalidation is completed. If yes, schedule in the vcpu. Otherwise, kill the 
guest due to unpredictable invalidation timeout.

Best regards,
Yang



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.