[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] VT-d async invalidation for Device-TLB.



>> >>> On 10.06.15 at 16:05, <JBeulich@xxxxxxxx> wrote:
> >>> On 03.06.15 at 09:49, <quan.xu@xxxxxxxxx> wrote:

Jan, thanks for your review!!

> > Design Overview
> > =============
> > This design implements a non-spinning model for Device-TLB
> > invalidation - using an interrupt based mechanism. Each domain
> > maintains a invalidation table, and the hypervisor has an entry of
> > invalidation tables. The invalidation table
> 
> entry? Do you mean array or table?
> 
It is a table or list to track the domains that has pending invalidation 
request. In invalidation complete event handler, the hypervisor will walk this 
table/list to find which domain's invalidation is completed.
Now I have a new ways to scan. We can get scan iommu->domid_bitmap[] to get the 
domain which is with assigned device, Then get the domain's invalidation status.


> > keeps the count of in-flight Device-TLB invalidation queues, and also
> > provides the same polling parameter for mutil in-flight Device-TLB
> > invalidation queues of each domain.
> 
> Which "same polling parameter"? I.e. I'm not sure what this is about in the 
> first
> place.
> 
It is similar to poll_slot in current Xen. In VT-d spec, it is also called 
status data.
For detail, I think we should know more about Invalidation Wait Descriptor.
More information about VT-d Invalidation Wait Descriptor, please refer to 
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html
6.5.2.8 Invalidation Wait Descriptor.

When we set SW of Invalidation Wait Descriptor, which indicates the 
invalidation wait descriptor completion by performing a coherent DWORD write of 
the value in the Status Data field to the address specified in the Status 
Address. 

For sync way, Xen has provided a local polling parameter, assigning the address 
in the Status Address of invalidation Wait descriptor and polling it for 
invalidation result in 1 second.
For async way, we provide a global polling parameter per domain, which is the " 
Same polling parameter ". assigning the address of global polling parameter per 
domain in the Status Address of each invalidation Wait descriptor when the 
domain submits invalidation requests.

When a domain issues a request to Device-TLB invalidation queue, update 
invalidation table's count of in-flight Device-TLB invalidation queue and 
assign the Status Data of wait descriptor of the invalidation queue.

For example:
  .
 -----
|invl |  Status Data = 1 (the count of in-flight Device-TLB invalidation queues)
|wait|  Status Address =  
virt_to_maddr(&_a_global_polling_parameter_per_domain_)
|desc| 
|----|
  .
  .
 -----
|invl|
|wait| Status Data = 2 (the count of in-flight Device-TLB invalidation queues)
|desc| Status Address = virt_to_maddr(&_a_global_polling_parameter_per_domain_)
|----|
  .
  .
 -----
|invl| 
|wait|  Status Data = 3 (the count of in-flight Device-TLB invalidation queues)
|desc|  Status Address = virt_to_maddr(&_a_global_polling_parameter_per_domain_)
|----|
  .
  .

In interrupt handler:
    If the count of in-flight Device-TLB invalidation queues == a global 
polling parameter per domain:
           This domain has no in-flight invalidation requests.
    else
           This domain has in-flight invalidation requests.

BTW, we set FN bit of Invalidation Wait Descriptor. The FN bit indicates the 
descriptors following the invalidation wait descriptor must be processed by 
hardware only after the invalidation Wait descriptor completes.


> > When a domain issues a request to Device-TLB invalidation queue,
> > update invalidation table's count of in-flight Device-TLB invalidation
> > queue and assign the Status Data of wait descriptor of the
> > invalidation queue. An interrupt is sent out to the hypervisor once a
> > Device-TLB invalidation request is done. In interrupt handler, we will
> > schedule a soft-irq to do the following
> > check:
> >     if invalidation table's count of in-flight Device-TLB invalidation
> > queues == polling parameter:
> >        This domain has no in-flight invalidation requests.
> >     else
> >        This domain has in-flight invalidation requests.
> > The domain is put into the "blocked" status if it has in-flight
> > Device-TLB invalidation requests, and awoken when all the requests are
> > done. A fault event will be generated if an invalidation failed. We
> > can either crash the domain or crash Xen.
> 
> Crashing Xen can't really be considered an option except when you can't 
> contain
> the failed invalidation to a particular VM (which, from what was written 
> above,
> should never happen).
> 

Make sense.

> >     For Context Invalidation and IOTLB invalidation without Device-TLB
> > invalidation, Invalidation Queue flushes synchronous invalidation as
> > before(This is a tradeoff and the cost of interrupt is overhead).
> 
> DMAR_OPERATION_TIMEOUT being 1s, are you saying that you're not intending
> to replace the current spinning for the non-ATS case?

Yes, we are not intending to replace the current spinning for the non-ATS case.


> Considering that expiring these loops results in panic()s, I would expect 
> these to
> become asynchronous _and_ contained to the affected VM alongside the ATS
> induced changed behavior. You talking of overhead - can you quantify that?
>

I tested it by a Myri-10G Dual-Protocol NIC, which is an ATS device. 
for an invalidation:
 By sync way, it takes about 1.4 ms.
 By async way, it takes about 4.3 ms.


> > More details:
> >
> > 1. invalidation table. We define iommu _invl structure in domain.
> > Struct iommu _invl {
> >     volatile u64 iommu _invl _poll_slot :62;
> >     domid_t dom_id;
> >     u64 iommu _invl _status_data :32;
> > }__attribute__ ((aligned (64)));
> >
> >    iommu _invl _poll_slot: Set it equal to the status address of wait
> > descriptor when the invalidation queue is with Device-TLB.
> >    dom_id: Keep the id of the domain.
> >    iommu _invl _status_data: Keep the count of in-flight queue with
> > Device-TLB invalidation.
> 
> Without further explanation above/below I don't think I really understand the
> purpose of this structure, nor its organization: Is this something imposed by 
> the
> VT-d specification? If so, a reference to the respective section in the spec 
> would
> be useful. If not, I can't see why the structure is laid out the (odd) way it 
> is.
> 

Refer to the explanation above. If it is still not clear, I will continue to 
explain in next email.

> > 2. Modification to Device IOTLB invalidation:
> >     - Enabled interrupt notification when hardware completes the
> > invalidations:
> >         Set FN, IF and SW bits in Invalidation Wait Descriptor. The
> > reason
> 
> A god design document would either give a (short) explanation of these bits, 
> or
> at the very least a precise reference to where in the spec they're being 
> defined.
> The way the VT-d spec is organized I generally find it quite hard to locate 
> the
> definition of specific fields when I have only a vague reference in hand. Yet
> reading the doc here should require the reader to spend meaningful extra
> amounts of time hunting down the corresponding pieces of the spec.
> 

Agreed.  I will enhance it when I send out the code.
More information about VT-d Invalidation Wait Descriptor, please refer to 
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html
6.5.2.8 Invalidation Wait Descriptor.

SW: indicate the invalidation wait descriptor completion by performing a 
coherent DWORD write of the value in the Status Data field to the address 
specified in the Status Address.
FN: indicate the descriptors following the invalidation wait descriptor must be 
processed by hardware only after the invalidation Wait descriptor completes.
IF: Indicate the invalidation wait descriptor completion by generating an 
invalidation completion event per the programing of the Invalidation Completion 
Event Registers.


> > why also set SW bit is that the interrupt for notification is global
> > not per domain. So we still need to poll the status address to know
> > which domain's flush request is
> >         completed in interrupt handler.
> 
> With the above taken care of, I would then hope to also be able to understand
> this (kind of an) explanation.
> 
> >     - A new per-domain flag (iommu_pending_flush) is used to track the
> > flush status of IOTLB invalidation with Device-TLB invalidation:
> >         iommu_pending_flush will be set before flushing the Device-TLB
> > invalidation.
> 
> What is "flushing an invalidation" supposed to mean? I think there's some
> problem with the wording here...
> 

Yes, it should be 'submit invalidation requests'.


> > 4. New interrupt handler for invalidation completion:
> >     - when hardware completes the invalidations with Device IOTLB, it
> > generates an interrupt to notify hypervisor.
> >     - In interrupt handler, we will schedule a soft-irq to handle the
> > finished invalidations.
> >     - soft-irq to handle finished invalidation:
> >         Scan the pending flush list
> >         for each entry in list
> >             check the values of iommu _invl _poll_slot and iommu _invl
> > _status_data in each domain's invalidation table.
> >             if yes, clear iommu_pending_flush and invalidation table,
> > then wakeup the domain.
> 
> Did you put some consideration into how long this list may get, and hence how
> long it may take you to iterate through the entire list?
> 

Only the domain which has the ATS device assigned will be tracked in this list. 
So the list length shouldn't be very long. Besides, the DEVICE-IOTLB 
invalidation doesn't happened frequently so the cost should be acceptable.

Thanks.

> Jan

Quan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.