[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet



On May 18, 2014 10:57:56 PM EDT, Bob Liu <lliubbo@xxxxxxxxx> wrote:
>Because of page scrub, it's very slow to destroy a domain with large
>memory.
>It took around 10 minutes when destroy a guest of nearly 1 TB of
>memory.
>
>[root@ca-test111 ~]# time xm des 5
>real    10m51.582s
>user    0m0.115s
>sys     0m0.039s
>[root@ca-test111 ~]#
>
>Use perf we can see what happened, thanks for Boris's help and provide
>this
>useful tool for xen.
>[root@x4-4 bob]# perf report
>22.32%       xl  [xen.syms]            [k] page_get_owner_and_reference
>    20.82%       xl  [xen.syms]            [k] relinquish_memory
>    20.63%       xl  [xen.syms]            [k] put_page
>    17.10%       xl  [xen.syms]            [k] scrub_one_page
>     4.74%       xl  [xen.syms]            [k] unmap_domain_page
>     2.24%       xl  [xen.syms]            [k] get_page
>     1.49%       xl  [xen.syms]            [k] free_heap_pages
>     1.06%       xl  [xen.syms]            [k] _spin_lock
>     0.78%       xl  [xen.syms]            [k] __put_page_type
>     0.75%       xl  [xen.syms]            [k] map_domain_page
>     0.57%       xl  [xen.syms]            [k] free_page_type
>     0.52%       xl  [xen.syms]            [k] is_iomem_page
>     0.42%       xl  [xen.syms]            [k] free_domheap_pages
>     0.31%       xl  [xen.syms]            [k] put_page_from_l1e
>     0.27%       xl  [xen.syms]            [k] check_lock
>     0.27%       xl  [xen.syms]            [k] __mfn_valid
>
>This patch try to delay scrub_one_page() to a tasklet which will be
>scheduled on
>all online physical cpus, so that it's much faster to return from
>'xl/xm
>destroy xxx'.

Thank you digging in this. However tasklets do not run in parallel. That is 
they are only executed on one CPU.

>
>Tested on a guest with 30G memory.
>Before this patch:
>[root@x4-4 bob]# time xl des PV-30G
>
>real 0m16.014s
>user 0m0.010s
>sys  0m13.976s
>[root@x4-4 bob]#
>
>After:
>[root@x4-4 bob]# time xl des PV-30G
>
>real 0m3.581s
>user 0m0.003s
>sys  0m1.554s
>[root@x4-4 bob]#
>
>The destroy time reduced from 16s to 3s.

Right. By moving the scrubbing from this function to a task let.
>
>Signed-off-by: Bob Liu <bob.liu@xxxxxxxxxx>
>---
> xen/common/page_alloc.c |   39 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 38 insertions(+), 1 deletion(-)
>
>diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
>index 601319c..2ca59a1 100644
>--- a/xen/common/page_alloc.c
>+++ b/xen/common/page_alloc.c
>@@ -79,6 +79,10 @@ PAGE_LIST_HEAD(page_offlined_list);
> /* Broken page list, protected by heap_lock. */
> PAGE_LIST_HEAD(page_broken_list);
> 
>+PAGE_LIST_HEAD(page_scrub_list);
>+static DEFINE_SPINLOCK(scrub_list_spinlock);
>+static struct tasklet scrub_page_tasklet;
>+
> /*************************
>  * BOOT-TIME ALLOCATOR
>  */
>@@ -1417,6 +1421,25 @@ void free_xenheap_pages(void *v, unsigned int
>order)
> #endif
> 
> 
>+static void scrub_free_pages(unsigned long unuse)
>+{
>+    struct page_info *pg;
>+
>+    for ( ; ; )
>+    {
>+        while ( page_list_empty(&page_scrub_list) )
>+            cpu_relax();
>+
>+        spin_lock(&scrub_list_spinlock);
>+        pg = page_list_remove_head(&page_scrub_list);
>+        spin_unlock(&scrub_list_spinlock);
>+        if (pg)
>+        {
>+            scrub_one_page(pg);
>+            free_heap_pages(pg, 0);
>+        }
>+    }

I fear that means you added an work item that can run for a very long time and 
cause security issues (DoS to guests). The VMEXIT code for example checks to 
see if a softirq is to run and will run any tasklets. Which means you could be 
running this scrubbing now in another guest context and cause it to be delayed 
significantly.


A couple of ideas;
 - have per cpu tasklets for nr_online_cpus and they all can try to do some 
batched work and if any anything is left reschedule themselves.
 - if a worker detects that it is not running within the idle domain context 
then schedule itself for later 
 - perhaps also look at having an per-cpu scrubbing list. And then feed them 
per node list ?

Thanks!
>+}
> 
> /*************************
>  * DOMAIN-HEAP SUB-ALLOCATOR
>@@ -1425,6 +1448,7 @@ void free_xenheap_pages(void *v, unsigned int
>order)
> void init_domheap_pages(paddr_t ps, paddr_t pe)
> {
>     unsigned long smfn, emfn;
>+    unsigned int cpu;
> 
>     ASSERT(!in_irq());
> 
>@@ -1435,6 +1459,9 @@ void init_domheap_pages(paddr_t ps, paddr_t pe)
>         return;
> 
>     init_heap_pages(mfn_to_page(smfn), emfn - smfn);
>+    tasklet_init(&scrub_page_tasklet, scrub_free_pages, 0);
>+    for_each_online_cpu(cpu)
>+        tasklet_schedule_on_cpu(&scrub_page_tasklet, cpu);
> }
> 
> 
>@@ -1564,8 +1591,17 @@ void free_domheap_pages(struct page_info *pg,
>unsigned int order)
>          * domain has died we assume responsibility for erasure.
>          */
>         if ( unlikely(d->is_dying) )
>+        {
>+            /*
>+             * Add page to page_scrub_list to speed up domain destroy,
>those
>+           * pages will be zeroed later by scrub_page_tasklet.
>+             */
>+            spin_lock(&scrub_list_spinlock);
>             for ( i = 0; i < (1 << order); i++ )
>-                scrub_one_page(&pg[i]);
>+                page_list_add_tail(&pg[i], &page_scrub_list);
>+            spin_unlock(&scrub_list_spinlock);
>+            goto out;
>+        }
> 
>         free_heap_pages(pg, order);
>     }
>@@ -1583,6 +1619,7 @@ void free_domheap_pages(struct page_info *pg,
>unsigned int order)
>         drop_dom_ref = 0;
>     }
> 
>+out:
>     if ( drop_dom_ref )
>         put_domain(d);
> }



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.