|  |  | 
  
    |  |  | 
 
  |   |  | 
  
    |  |  | 
  
    |  |  | 
  
    |   xen-ppc-devel
[XenPPC] Re: copy_4K_page() doesn't use dcbtst? 
| Hollis Blanchard writes:
> Hi Paul, some Xen people were just noticing that copy_4K_page
> (arch/powerpc/lib/copypage_64.S) doesn't use the dcbtst instruction. Why
> doesn't it help there?
Why would we want to read the cache lines for the destination from
memory when we're only going to overwrite them completely anyway?
A stronger argument would be for using dcbz, but IIRC it actually made
things slower (on POWER4 at least).  I suspect the hardware is
gathering the stores for the whole of each cache line automatically,
so using dcbz doesn't provide any benefit.
I did a lot of measurements of memory copy speed on POWER4 (using
different copy loops, copy sizes, alignments, cache hot/cold cases)
and the copy_4K_page loop is the fastest I could come up with for
POWER4.  If anyone can come up with a routine that is measurably
faster on current machines, I'm happy to look at it, of course.
Paul.
_______________________________________________
Xen-ppc-devel mailing list
Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ppc-devel
 | 
 |  | 
  
    |  |  |