09 Dec 2002, Andreas Landmark <andreas.landmark at noxtension.com> wrote:

 >>  >> That won't matter at all for dnetc, the fast cores are all
 >>  >> written in assembler.
 >>  S> In my tests Intel Compiler generated about faster core for
 >>  S> single pipe that any of the assembler versions included (almost
 >>  S> as fast as SES 2-pipe).
 >> For which CPU ??
 >> For P4 ... could be possible, because of its enormously castrated
 >> design. For P3, athlon et al IMHO not very likely.
 AL> It's kindof self-explanatory why it doesn't work magic for Athlon,
read again ... I said P3, athlon

 AL> you can't really expect Intel SoftwareEngineers to sit and optimize
 AL> their compiler (which is a defacto showcase for their cpu-features)
 AL> for the core of their main (and almost only) rival?
The athlon is a "p3 on steroids" ...
so many optimizations for p3 also help athlon.
naturally this bugs intel, but it is so.

But what I meant, is that I don't believe that a compiled c programm on sane
hardware (read NOT 20 pipeline stages, "trace cache" no barrel shifter ... etc)
is faster than optimized assembler.

But Slawek states that he has measured it on a P3.
Maybe he will elaborate on it ...

 AL> Andreas D Landmark / noXtension

