[XenPPC] Re: copy_4K_page() doesn't use dcbtst?

A stronger argument would be for using dcbz, but IIRC it actually made
things slower (on POWER4 at least).  I suspect the hardware is
gathering the stores for the whole of each cache line automatically,
so using dcbz doesn't provide any benefit.


It seems on 970 at least it still is a nice win.  Do you have any
good benchmarks I could run?

I did a lot of measurements of memory copy speed on POWER4 (using
different copy loops, copy sizes, alignments, cache hot/cold cases)
and the copy_4K_page loop is the fastest I could come up with for
POWER4.


Yeah, POWER4 is quite a different beast (its memory subsystem,
anyway).  I'm surprised dcbz hurt though; did you schedule it
early enough before the actual data copy?


Segher


_______________________________________________
Xen-ppc-devel mailing list
Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ppc-devel

<Prev in Thread]	Current Thread	[Next in Thread>
[XenPPC] copy_4K_page() doesn't use dcbtst?, Hollis Blanchard [XenPPC] Re: copy_4K_page() doesn't use dcbtst?, Paul Mackerras [XenPPC] Re: copy_4K_page() doesn't use dcbtst?, Hollis Blanchard [XenPPC] Re: copy_4K_page() doesn't use dcbtst?, Segher Boessenkool <=

Previous by Date:

[XenPPC] [PATCH] Fix a couple of typos, Tony Breeds

Next by Date:

Re: [XenPPC] [PATCH/RFC] Schedule idle domain on secondary processors, Segher Boessenkool

Previous by Thread:

[XenPPC] Re: copy_4K_page() doesn't use dcbtst?, Hollis Blanchard

Next by Thread:

[XenPPC] [xenppc-unstable] [POWERPC] oops on commit, Xen patchbot-xenppc-unstable

Indexes:

[Date] [Thread] [Top] [All Lists]