[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] VT-d async invalidation for Device-TLB.



>>> On 12.06.15 at 04:40, <quan.xu@xxxxxxxxx> wrote:
>> > >>> On 10.06.15 at 16:05, <JBeulich@xxxxxxxx> wrote:
>> >>> On 03.06.15 at 09:49, <quan.xu@xxxxxxxxx> wrote:
>> >     For Context Invalidation and IOTLB invalidation without Device-TLB
>> > invalidation, Invalidation Queue flushes synchronous invalidation as
>> > before(This is a tradeoff and the cost of interrupt is overhead).
>> 
>> DMAR_OPERATION_TIMEOUT being 1s, are you saying that you're not intending
>> to replace the current spinning for the non-ATS case?
> 
> Yes, we are not intending to replace the current spinning for the non-ATS 
> case.

I'm not really happy about that.

>> Considering that expiring these loops results in panic()s, I would expect 
> these to
>> become asynchronous _and_ contained to the affected VM alongside the ATS
>> induced changed behavior. You talking of overhead - can you quantify that?
>>
> 
> I tested it by a Myri-10G Dual-Protocol NIC, which is an ATS device. 
> for an invalidation:
>  By sync way, it takes about 1.4 ms.
>  By async way, it takes about 4.3 ms.

What's the theory on why this is? After all, it shouldn't matter how
the completion of the invalidation gets signaled.

Apart from that measuring the ATS case (in which case we're set to
use async mode anyway) is kind of pointless here - we'd need to
know the overhead of non-ATS async compared to non-ATS sync.

>> > More details:
>> >
>> > 1. invalidation table. We define iommu _invl structure in domain.
>> > Struct iommu _invl {
>> >     volatile u64 iommu _invl _poll_slot :62;
>> >     domid_t dom_id;
>> >     u64 iommu _invl _status_data :32;
>> > }__attribute__ ((aligned (64)));
>> >
>> >    iommu _invl _poll_slot: Set it equal to the status address of wait
>> > descriptor when the invalidation queue is with Device-TLB.
>> >    dom_id: Keep the id of the domain.
>> >    iommu _invl _status_data: Keep the count of in-flight queue with
>> > Device-TLB invalidation.
>> 
>> Without further explanation above/below I don't think I really understand 
> the
>> purpose of this structure, nor its organization: Is this something imposed 
> by the
>> VT-d specification? If so, a reference to the respective section in the spec 
> would
>> be useful. If not, I can't see why the structure is laid out the (odd) way 
> it is.
>> 
> 
> Refer to the explanation above. If it is still not clear, I will continue to 
> explain in next email.

The explanation above helped for what I asked above, but didn't
make clear to me what the structure here is, how it relates to hw
defined structures, and hence (as said) why it is laid out the way it
is.

>> > 4. New interrupt handler for invalidation completion:
>> >     - when hardware completes the invalidations with Device IOTLB, it
>> > generates an interrupt to notify hypervisor.
>> >     - In interrupt handler, we will schedule a soft-irq to handle the
>> > finished invalidations.
>> >     - soft-irq to handle finished invalidation:
>> >         Scan the pending flush list
>> >        for each entry in list
>> >             check the values of iommu _invl _poll_slot and iommu _invl
>> > _status_data in each domain's invalidation table.
>> >             if yes, clear iommu_pending_flush and invalidation table,
>> > then wakeup the domain.
>> 
>> Did you put some consideration into how long this list may get, and hence 
> how
>> long it may take you to iterate through the entire list?
>> 
> 
> Only the domain which has the ATS device assigned will be tracked in this 
> list. So the list length shouldn't be very long.

Okay, if this is a list of domains (or of devices), that would hopefully
be acceptable (albeit on a huge system this could still be dozens). If
this was a list of pending flush requests, it might be worse.

> Besides, the DEVICE-IOTLB 
> invalidation doesn't happened frequently so the cost should be acceptable.

That's not a valid consideration: At no time must any processing
inside the hypervisor take arbitrarily long. This requirement is
entirely independent of how frequently such cases may occur.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.