[Xen-devel] [VTD] Intel iommu IOTLB flush really slow

Hi,

Some IOMMU DMA remapping engine sometimes take longer to flush the IOTLBs.
For instance on Ibex Peak a iommu_map_page can in the order of milisecondes.

In the Intel IOMMU spec you can see that you don't need to flush if the PTE was
present so it's all good when we are creating a domain because we don't need to
flush anything. Some problem happen when we try to move memory arround.

Here is some code from hvmloader, pci.c:190 on xen-unstable:

while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
{
    struct xen_add_to_physmap xatp;
    if ( hvm_info->high_mem_pgend == 0 )
        hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
    xatp.domid = DOMID_SELF;
    xatp.space = XENMAPSPACE_gmfn;
    xatp.idx   = --hvm_info->low_mem_pgend;
    xatp.gpfn  = hvm_info->high_mem_pgend++;
    if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
        BUG();
}

This code gets triggered when the PCI hole increased so much that it
overlaps with the allocated RAM. So we have to relocate the section that
overlap in the top memory.

If we folow the code down to Xen we can find that add_to_physmap calls
set_p2m_entry which uses either p2m_set_entry or ept_set_entry with an order
or 0, yes we only try to move one page.

Both implementations update the iommu page table with iommu_map_page.
So at the end we end up doing a loop of iommu_map_page driven by this loop
in hvmloader.

The IOMMU DMA remapping enigne of the Intel GPU is really really
slow to flush. So when we try to create a domain that does Intel GPU pass
through with enough memory to force a relocation of the top RAM below 4G
the domain can take minutes to start!

There are multiple approches that we can use to fix this problem, but before I
start working on a patch I would like to get the list's point of view.

Plan A:
  - Add a new XENMEM add_to_physmap_range that would relocate a gfn range to a 
new gfn.
  - Add a flag in the IOMMU API to delay the IOTLB flush
  - Add a new API call to flush the the IOTLB manully once we relocate all the 
range.

Plan B:
  - Add a new XENMEM add_to_physmap_range that would relocate a gfn range to a 
new gfn.
  - Add a new set_p2m_entry function that will understand batches of gfns and 
mfns.
  - Implement batch operation for shadow and HAP.
  - Add new IOMMU API to support batch operation

(A) isn't very nice but has the benefit of not modifying to much code, (B) 
would be the
right thing to do but would be quite disruptive in term of code and API change.

Let me know what you think,
Jean

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] [VTD] Intel iommu IOTLB flush really slow