[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Linux grant map/unmap improvement proposal (Draft B)



On 13/10/14 14:41, David Vrabel wrote:
> 
> Design
> ======

Jennifer has put together most of the initial implementation of this so
expect a full series some time next year.

It didn't quite end up as described here.

> Userspace address to page translation
> -------------------------------------
> 
> The m2p_override table shall be removed.
> 
> Each VMA (struct vm_struct) shall contain an additional pointer to an
> optional array of pages.  This array shall be sized to cover the full
> extent of the VMA.
> 
> The gntdev driver populates this array with the relevant pages for the
> foreign mappings as they are mapped.  It shall also clear them when
> unmapping.  The gntdev driver must ensure it properly splits the page
> array when the VMA itself is split.
> 
> Since the m2p lookup will not return a local PFN, the native
> get_user_pages_fast() call will fail.  Prior to attempting to fault in
> the pages, get_user_pages() can simply look up the pages in the VMA's
> page array.

This was not true.  Instead, we mark the userspace PTEs as special
(_PAGE_SPECIAL set) which causes the generic x86 code to skip the fast path.

We also changed vm_normal_page() to look in vma->pages which puts the
extra code outside of any common use case (i.e., away from any handling
of non-special mappings), further reducing the impact on existing use cases.

For the curious, the 3-liner mm/memory.c change is below (although this
does not handle VMA splitting yet, but that should be straight-forwards).

> Userspace grant performance
> ---------------------------
> 
> - Lazily map grants into userspace on faults.  For applications that
>   do not access the foreign frames by the userspace mappings (such as
>   block backends using direct I/O) this would avoid a set of maps and
>   unmaps. This lazy mode would have to be requested by the userspace
>   program (since faulting many pages would be much more expensive than
>   a single batched map).

This does not look possible without more invasive changes to core MM
code.  Although we can lazily fault in the mappings we still need PTEs
to allow get_user_pages() to work, so map-on-fault isn't useful.

David

--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -289,6 +289,7 @@ struct vm_area_struct {
 #ifdef CONFIG_NUMA
        struct mempolicy *vm_policy;    /* NUMA policy for the VMA */
 #endif
+       struct page     **pages;
 };

 struct core_thread {
diff --git a/mm/memory.c b/mm/memory.c
index 4b60011..3ca13bb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -774,6 +774,8 @@ struct page *vm_normal_page(struct vm_area_struct
*vma, unsigned long addr,
        if (HAVE_PTE_SPECIAL) {
                if (likely(!pte_special(pte)))
                        goto check_pfn;
+               if (vma->pages)
+                       return vma->pages[(addr - vma->vm_start) >> PAGE_SHIFT];
                if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
                        return NULL;
                if (!is_zero_pfn(pfn))
-- 
1.7.10.4

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.