I would expect to see dcbtst in here, no?
Nah, dcbtst is expensive (it causes some non-cheap bus
transactions) and not needed at all; dcbz is much better
(but can only be used if you kill the whole cache line;
which is true here).
Both functions (copy and clear) could stand a little loop unrolling.
ldu ; stdu ; bdnz is not the best loop possible, esp. not on
970/P4/P5. You guys got Mac's, use Shark (go to the code browser,
cmd-shift-M, select "show 970 dispatch groups" and "show 970
details drawer"). In most cases the time spent in the loop will
be dominated by memory (cache) speed, of course, but still.
I can understand if you're not *really* trying to optimize these,
but in
that case why do you want to add dcbz? Is there a noticeable
performance
improvement?
Yes, dcbz is (should be) a huge improvement.
Segher
_______________________________________________
Xen-ppc-devel mailing list
Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ppc-devel
|