Hi Alex.
Thank you very much for measurement. It's interesting.
At first I found a big bug in the deferred page freeing yesterday.
It flushes unnecessarily. It is under development.
So probably the bug causes the degration.
I checked kernel compile with the per vcpu vhpt patch and
the tlb tracking patch (without deferred page freeing patch),
I saw improvemnt.
I should have explained the patches.
- per vcpu vhpt
What is this patch for?
It focuses on vcpu migration between physical cpus.
With credit scheduler, vcpu is heavily migrated.
This patch tries to reduce vTLB flush when vcpu is migrated.
Expected effect
When vcpu migration is occurred frequently, performance would be increased.
- tlb tracking
What is this patch for?
It forcuses on grant table mapping.
When page is unmapped, full vTLB flush is necessary.
By tracking tlb insert on grant mapped page, full vTLB flush
can be avoided.
Especially vbd does only DMA, so dom0 doesn't insert tlb entry
on the grant mapped page. In such case any vTLB flush isn't needed.
Expected effect
vbd performance increase.
vnif packet sending performace increase.
- deferred page freeing
What is this patch for?
When the page in which tlb insert isn't tracked is unmapped/zapped from
domain, full vTLB flush is necessary again.
Balloon driver and grant table page transfer is the case.
This patch focuses on it.
It tries to batch freeing/zapping page from domain in order
to reduce full vTLB flush.
Expected effect
vnif packet receiving performance increase
balloon driver performance increase
On Mon, Aug 07, 2006 at 01:54:38PM -0600, Alex Williamson wrote:
> On Fri, 2006-08-04 at 21:27 +0900, Isaku Yamahata wrote:
> > Hi all
> > These patches are for performance tuning.
> > They are for comment, review and evaluation.
> >
> > - per vcpu vhpt
> > - tlb tracking
> > - deferred page freeing
> > NEW: This patch is incomplete yet. It must be polished more.
>
> Here are my performance numbers:
>
> System: 2 Cell HP Superdome, 8-way 1.5GHz/6M, 12GB RAM
>
> The test: UP dom0 (2GB, single user mode), 7-way domU (3GB, single user
> mode, no network), kernel build time w/ make -j8 (4 runs, 1st run thrown
> out, average of other 3 runs)
>
> Stock (cset 10931):
>
> real: 282.643s
> user: 1733.523s
> sys: 132.650s
>
> Patches applied, TLB tracking NOT enabled (fixed domain.c build):
>
> real: 282.209s (0.998)
> user: 1734.533s (1.001)
> sys: 130.253s (0.982)
>
> Patches applied, TLB tracking enabled:
>
> real: 288.591s (1.021)
> user: 1770.453s (1.021)
> sys: 143.297s (1.080)
>
> So it looks like w/o TLB tracking enabled, the patch is probably within
> the noise of my test. With TLB tracking enabled, there is a small, but
> noticeable performance degradation. Under what conditions might we see
> a performance improvement? Thanks,
The per vcpu vhpt patch focuses vcpu migration cost.
Given your setup that # of vcpu = # of physical cpu,
probably there were no vcpu migration.
It can be observed by running "xm vcpu-list" periodically.
If vcpu migration didn't occured, your result means that
the per vcpu vhpt doesn't introduce overhead. So it's good result.
To see the pervcpu vhpt effect, vcpu migration is necessary.
So make # of vcpu > # of pcpu by creating more domU and
compiling on the domUs simalteniously with credit scheduler.
In such case I expect to see the difference.
For TLB tracking, I'm somewhat shocked.
I expected much performance increase.
I hope that the deferred page freeing patch spoiled it though,
the benchmak will show the result.
I tested the deferred page freeing patch by wget very roughly.
Although network performance is horrible yet, it showed improvement.
However your result seems that it causes overhead.
Thanks.
--
yamahata
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|