If you really want to explore mem/page copy for XenPPC then you have
to understand that since we run without an MMU, profiling code with
MMU on, _including_ RMA, is not helpful because the access is guarded
(G=1, I=0). For more information see 970FX UM Sections:
6.3.8.4 Loads in Real Mode
6.3.9.4 Stores in Real Mode
You will probably find that grouping (as Hollis suggests) by cache
line will be much better. but also prefetch the next line somehow.
Please run your experiments _in_ Xen,and use timebase (ticks) or NOW
() (nanosecs) to model it.
On Dec 15, 2006, at 6:31 PM, Hollis Blanchard wrote:
On Fri, 2006-12-15 at 17:50 -0500, poff wrote:
3) Useful when PPC must do page copies in place of 'page flipping'.
So you're saying we should worry about it later?
For the future, copy_page using dcbz:
diff -r 7669fca80bfc xen/arch/powerpc/mm.c
--- a/xen/arch/powerpc/mm.c Mon Dec 04 11:46:53 2006 -0500
+++ b/xen/arch/powerpc/mm.c Fri Dec 15 17:52:58 2006 -0500
@@ -280,7 +280,8 @@ extern void copy_page(void *dp, void *sp
if (on_systemsim()) {
systemsim_memcpy(dp, sp, PAGE_SIZE);
} else {
- memcpy(dp, sp, PAGE_SIZE);
+ clear_page(dp);
+ __copy_page(dp, sp);
}
}
diff -r 7669fca80bfc xen/include/asm-powerpc/page.h
--- a/xen/include/asm-powerpc/page.h Mon Dec 04 11:46:53 2006 -0500
+++ b/xen/include/asm-powerpc/page.h Fri Dec 15 17:52:58 2006 -0500
@@ -90,6 +90,25 @@ 1: dcbz 0,%0\n\
extern void copy_page(void *dp, void *sp);
+static __inline__ void __copy_page(void *dp, void *sp)
+{
+ ulong dwords, dword_size;
+
+ dword_size = 8;
+ dwords = (PAGE_SIZE / dword_size) - 1;
+
+ __asm__ __volatile__(
+ "mtctr %2 # copy_page\n\
+ ld %2,0(%1)\n\
+ std %2,0(%0)\n\
+1: ldu %2,8(%1)\n\
+ stdu %2,8(%0)\n\
+ bdnz 1b"
+ : /* no result */
+ : "r" (dp), "r" (sp), "r" (dwords)
+ : "%ctr", "memory");
+}
+
I'd rather have copy_page() dcbz; stdu; stdu; stdu; ... stdu; in each
loop iteration.
It would also be nice to improve memcpy, though that one is certainly
more difficult due to alignment, varying lengths, etc.
Out current memcpy() comes from memcpy.S which is straight from
linux, its not the best, but prolly good enuff.
Perhaps we can
borrow code from
http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html
This tunes for usermode. I don't think its performance is relevant.
-JX
_______________________________________________
Xen-ppc-devel mailing list
Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ppc-devel
|