[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] VT-d async invalidation for Device-TLB.



Hi All,
     This Email is about VT-d async invalidation for Device-TLB.

Background
=========

As Jan Beulich 
mentioned(http://lists.xenproject.org/archives/html/xen-devel/2014-06/msg03351.html
 ), VT-d code currently has a number of cases where completion of certain 
operations is being waited for by way of spinning. The majority of instances 
use that variable indirectly through IOMMU_WAIT_OP() macro , allowing for loops 
of up to 1 second(DMAR_OPERATION_TIMEOUT). While in many of the cases this may 
be acceptable, the invalidation case seems particularly problematic. Currently 
hypervisor polls the status address of wait descriptor up to 1 second to get 
Invalidation flush result. When Invalidation queue includes Device-TLB 
invalidation, Using 1 second is a mistake here in the validation sync. As the 1 
second timeout here is related to response times by the IOMMU engine, Instead 
of Device-TLB invalidation with PCI-e Address Translation Services (ATS) in 
use. the ATS specification mandates a timeout of 1 _minute_ for cache flush. 
The ATS case needs to be taken into consideration when doing invalidations.
Obviously we can't spin for a minute, so invalidation absolutely needs to be 
converted to a non-spinning model.

Design Overview
=============
This design implements a non-spinning model for Device-TLB invalidation - using 
an interrupt based mechanism. Each domain maintains a invalidation table, and 
the hypervisor has an entry of invalidation tables. The invalidation table 
keeps the count of in-flight Device-TLB invalidation queues, and also provides 
the same polling parameter for mutil in-flight Device-TLB invalidation queues 
of each domain.
When a domain issues a request to Device-TLB invalidation queue, update 
invalidation table's count of in-flight Device-TLB invalidation queue and 
assign the Status Data of wait descriptor of the invalidation queue. An 
interrupt is sent out to the hypervisor once a Device-TLB invalidation request 
is done. In interrupt handler, we will schedule a soft-irq to do the following 
check: 
    if invalidation table's count of in-flight Device-TLB invalidation queues 
== polling parameter:
           This domain has no in-flight invalidation requests.
    else
           This domain has in-flight invalidation requests.
The domain is put into the "blocked" status if it has in-flight Device-TLB 
invalidation requests, and awoken when all the requests are done. A fault event 
will be generated if an invalidation failed. We can either crash the domain or 
crash Xen.
    For Context Invalidation and IOTLB invalidation without Device-TLB 
invalidation, Invalidation Queue flushes synchronous invalidation as 
before(This is a tradeoff and the cost of interrupt is overhead).

More details:

1. invalidation table. We define iommu _invl structure in domain.
Struct iommu _invl {
    volatile u64 iommu _invl _poll_slot :62;
    domid_t dom_id;
    u64 iommu _invl _status_data :32;
}__attribute__ ((aligned (64)));

   iommu _invl _poll_slot: Set it equal to the status address of wait 
descriptor when the invalidation queue is with Device-TLB.
   dom_id: Keep the id of the domain.
   iommu _invl _status_data: Keep the count of in-flight queue with Device-TLB 
invalidation.

2. Modification to Device IOTLB invalidation:
    - Enabled interrupt notification when hardware completes the invalidations: 
        Set FN, IF and SW bits in Invalidation Wait Descriptor. The reason why 
also set SW bit is that the interrupt for notification is global not per 
domain. So we still need to poll the status address to know which domain's 
flush request is
        completed in interrupt handler.
    - A new per-domain flag (iommu_pending_flush) is used to track the flush 
status of IOTLB invalidation with Device-TLB invalidation:
        iommu_pending_flush will be set before flushing the Device-TLB 
invalidation.
    - new logic to do synchronize.
        if no Device-TLB invalidation:
            Back to current invalidation logic.
           else 
            Set IF, SW, FN bit in wait descriptor and prepare the Status Data.
            Set iommu_pending_flush
            Put the domain in pending flush list
            Return

3. Modification to domain running lifecycle:
    - When iommu_pending_flush is set, the domain is not allowed to enter 
non-root mode: pause domain before VM entry.

4. New interrupt handler for invalidation completion:
    - when hardware completes the invalidations with Device IOTLB, it generates 
an interrupt to notify hypervisor.
    - In interrupt handler, we will schedule a soft-irq to handle the finished 
invalidations.
    - soft-irq to handle finished invalidation:
        Scan the pending flush list
            for each entry in list
            check the values of iommu _invl _poll_slot and iommu _invl 
_status_data in each domain's invalidation table.
            if yes, clear iommu_pending_flush and invalidation table, then 
wakeup the domain.
     (We can leverage IM bit of Invalidation Event Control Register to optimize 
the interrupt).

5. invalidation failed.
    - A fault event will be generated if invalidation failed. we can either 
crash the domain or crash Xen if receive an invalidation fault event.



Intel OTC
Quan Xu


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.