[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] alloc_heap_pages is low efficient with more CPUs



ï
What if you replace tlbflush_filter() call with cpus_clear(&extra_cpus_mask)?
:  you mean just clear it,  maybe a little violent..,  you 'd like to do it at any other place.
 
I assume you see lots of looping in one of those two functions, but only single-page-at-a-time calls into alloc_domheap_pages()->alloc_heap_pages()?
:  In populate_physmap,  all pages are 2M size,
static void populate_physmap(struct memop_args *a)
{
    for ( i = a->nr_done; i < a->nr_extents; i++ )
    {
page = alloc_domheap_pages(d, a->extent_order, a->memflags) ->alloc_heap_pages ;  //a->extent_order = 9, always 2M size
}
//you mean move that block and TLB-flush here to avoid for loop ?
}

tupeng212
 
Date: 2012-10-13 14:30
Subject: Re: [Xen-devel] alloc_heap_pages is low efficient with more CPUs
What if you replace tlbflush_filter() call with cpus_clear(&extra_cpus_mask)? Seems a bit silly to do, but Iâd like to know how much a few cpumask operations per page is costing (most are of course much quicker than tlbflush_filter as they operate on 64 bits per iteration, rather than one bit per iteration).

If that is suitably fast, I think we can have a go at fixing this by pulling the TLB-flush logic out of alloc_heap_pages() and into the loops in increwase_reservation() and populate_physmap() in common/memory.c. I assume you see lots of looping in one of those two functions, but only single-page-at-a-time calls into alloc_domheap_pages()->alloc_heap_pages()?

  -- Keir

On 13/10/2012 07:21, "tupeng212" <tupeng212@xxxxxxxxx> wrote:

If the tlbflush_filter() and cpumask_or() lines are commented out from the
if(need_tlbflush) block in alloc_heap_pages(), what are the domain creation
times like then?
: You mean removing these code from alloc_heap_pages, then try it.
I didn't do it as you said, but I calculated the whole time of if(need_tlbflush) block
by using time1=NOW() ...block ... time2=NOW(), time=time2-time1, its unit is ns, and s = ns * 10^9
it occupy high rate of the whole time. whole starting time is 30s, then this block may be 29s.
 
By the way it looks like you are not using xen-unstable or
xen-4.2, can you try with one of these later versions of Xen?
: fortunately, other groups have to use xen-4.2, we have repeated this experiment on
that source code too, it changed nothing, time is still very long in second starting.
 

tupeng
 
 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.