Xen project Mailing List

Re: [Xen-devel] [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet

On May 18, 2014 10:57:56 PM EDT, Bob Liu <lliubbo@xxxxxxxxx> wrote: >Because of page scrub, it's very slow to destroy a domain with large >memory. >It took around 10 minutes when destroy a guest of nearly 1 TB of >memory. > >[root@ca-test111 ~]# time xm des 5 >real 10m51.582s >user 0m0.115s >sys 0m0.039s >[root@ca-test111 ~]# > >Use perf we can see what happened, thanks for Boris's help and provide >this >useful tool for xen. >[root@x4-4 bob]# perf report >22.32% xl [xen.syms] [k] page_get_owner_and_reference > 20.82% xl [xen.syms] [k] relinquish_memory > 20.63% xl [xen.syms] [k] put_page > 17.10% xl [xen.syms] [k] scrub_one_page > 4.74% xl [xen.syms] [k] unmap_domain_page > 2.24% xl [xen.syms] [k] get_page > 1.49% xl [xen.syms] [k] free_heap_pages > 1.06% xl [xen.syms] [k] _spin_lock > 0.78% xl [xen.syms] [k] __put_page_type > 0.75% xl [xen.syms] [k] map_domain_page > 0.57% xl [xen.syms] [k] free_page_type > 0.52% xl [xen.syms] [k] is_iomem_page > 0.42% xl [xen.syms] [k] free_domheap_pages > 0.31% xl [xen.syms] [k] put_page_from_l1e > 0.27% xl [xen.syms] [k] check_lock > 0.27% xl [xen.syms] [k] __mfn_valid > >This patch try to delay scrub_one_page() to a tasklet which will be >scheduled on >all online physical cpus, so that it's much faster to return from >'xl/xm >destroy xxx'. Thank you digging in this. However tasklets do not run in parallel. That is they are only executed on one CPU. > >Tested on a guest with 30G memory. >Before this patch: >[root@x4-4 bob]# time xl des PV-30G > >real 0m16.014s >user 0m0.010s >sys 0m13.976s >[root@x4-4 bob]# > >After: >[root@x4-4 bob]# time xl des PV-30G > >real 0m3.581s >user 0m0.003s >sys 0m1.554s >[root@x4-4 bob]# > >The destroy time reduced from 16s to 3s. Right. By moving the scrubbing from this function to a task let. > >Signed-off-by: Bob Liu <bob.liu@xxxxxxxxxx> >--- > xen/common/page_alloc.c | 39 ++++++++++++++++++++++++++++++++++++++- > 1 file changed, 38 insertions(+), 1 deletion(-) > >diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c >index 601319c..2ca59a1 100644 >--- a/xen/common/page_alloc.c >+++ b/xen/common/page_alloc.c >@@ -79,6 +79,10 @@ PAGE_LIST_HEAD(page_offlined_list); > /* Broken page list, protected by heap_lock. */ > PAGE_LIST_HEAD(page_broken_list); > >+PAGE_LIST_HEAD(page_scrub_list); >+static DEFINE_SPINLOCK(scrub_list_spinlock); >+static struct tasklet scrub_page_tasklet; >+ > /************************* > * BOOT-TIME ALLOCATOR > */ >@@ -1417,6 +1421,25 @@ void free_xenheap_pages(void *v, unsigned int >order) > #endif > > >+static void scrub_free_pages(unsigned long unuse) >+{ >+ struct page_info *pg; >+ >+ for ( ; ; ) >+ { >+ while ( page_list_empty(&page_scrub_list) ) >+ cpu_relax(); >+ >+ spin_lock(&scrub_list_spinlock); >+ pg = page_list_remove_head(&page_scrub_list); >+ spin_unlock(&scrub_list_spinlock); >+ if (pg) >+ { >+ scrub_one_page(pg); >+ free_heap_pages(pg, 0); >+ } >+ } I fear that means you added an work item that can run for a very long time and cause security issues (DoS to guests). The VMEXIT code for example checks to see if a softirq is to run and will run any tasklets. Which means you could be running this scrubbing now in another guest context and cause it to be delayed significantly. A couple of ideas; - have per cpu tasklets for nr_online_cpus and they all can try to do some batched work and if any anything is left reschedule themselves. - if a worker detects that it is not running within the idle domain context then schedule itself for later - perhaps also look at having an per-cpu scrubbing list. And then feed them per node list ? Thanks! >+} > > /************************* > * DOMAIN-HEAP SUB-ALLOCATOR >@@ -1425,6 +1448,7 @@ void free_xenheap_pages(void *v, unsigned int >order) > void init_domheap_pages(paddr_t ps, paddr_t pe) > { > unsigned long smfn, emfn; >+ unsigned int cpu; > > ASSERT(!in_irq()); > >@@ -1435,6 +1459,9 @@ void init_domheap_pages(paddr_t ps, paddr_t pe) > return; > > init_heap_pages(mfn_to_page(smfn), emfn - smfn); >+ tasklet_init(&scrub_page_tasklet, scrub_free_pages, 0); >+ for_each_online_cpu(cpu) >+ tasklet_schedule_on_cpu(&scrub_page_tasklet, cpu); > } > > >@@ -1564,8 +1591,17 @@ void free_domheap_pages(struct page_info *pg, >unsigned int order) > * domain has died we assume responsibility for erasure. > */ > if ( unlikely(d->is_dying) ) >+ { >+ /* >+ * Add page to page_scrub_list to speed up domain destroy, >those >+ * pages will be zeroed later by scrub_page_tasklet. >+ */ >+ spin_lock(&scrub_list_spinlock); > for ( i = 0; i < (1 << order); i++ ) >- scrub_one_page(&pg[i]); >+ page_list_add_tail(&pg[i], &page_scrub_list); >+ spin_unlock(&scrub_list_spinlock); >+ goto out; >+ } > > free_heap_pages(pg, order); > } >@@ -1583,6 +1619,7 @@ void free_domheap_pages(struct page_info *pg, >unsigned int order) > drop_dom_ref = 0; > } > >+out: > if ( drop_dom_ref ) > put_domain(d); > } _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.