[RC5] core questions

Bruce Ford ford at aus3.com
Fri Mar 15 00:15:25 EST 2002


> Then was the AMD core optimized for the available instruction set? Im
sorry
> i missed the discussion on it previously.

The MMX core was developed for the P5-MMX, which for the MMX pipeline is
dual issue, with the restriction that only one may be a shift, 1 clock
latency, 1 clock load, 2 clock store.

AMD K6 was single issue MMX.  MMX core is not competetive with ALU core when
single issue.

AMD K6-2 is dual issue MMX.  Significant effort was put into getting the MMX
core optimized for this CPU.  I could make it get the "correct" clocks for
one "cycle" of RC5.  When the block of code was repeated for the next cycle
there would be a significant increase in the clocks taken for no apparent
reason.  No amount of instruction shuffling seemed to help though it did
tend to make the point of increase move.  The increase was 2 clocks which
extrapolated over 78 cycles would have made the MMX core slower than the ALU
core.

With out of order processors it is hard to determine exactly when a clock
increase occurs.  There were times when adding an extra pair of instructions
decreased the clock count by 1 rather than increasing it by 1!

My only theory for the increase is that the instuction sequence requires 32
byte cache line alignment.  Other theories welcome.  Someone should really
run AMD CodeAnalyst on it and find the problem.  Sadly I do not have the
necessary tools.

AMD K7 is still being worked on.

Prior to the K7 (which has triple instruction decode and issue capability)
the AMD K5 was the fastest core in the x86 range in terms of clocks/key.
The joy of pairable, single clock, rotate instructions.

Bruce Ford

--
To unsubscribe, send 'unsubscribe rc5' to majordomo at lists.distributed.net
rc5-digest subscribers replace rc5 with rc5-digest



More information about the rc5 mailing list