AW: [RC5] Athlon XP/MP even faster? K5 faster than K6!

Bruce Ford at
Tue Oct 23 01:23:49 EDT 2001

> I believe this is because the K5 has a more powerful / effective (ie.
> higher IPC)
> integer unit than the K6. That was generally the K5's strength over the
> IIRC the K5 was based on NexGen's work, while the K6 was using the Chomper
> core.

The K5 had pairable single clock cycle rotates.

On the K6 the rotate instruction is a vector decode to RISC86-ops.  Guessing
that the rotate left is simulated with "copy reg-shift left-negate
mask-shift right-or" makes it take a minimum of 4 clock cycles (with some
pairing) and shifts are only allowed in alux.

Tried to improve the K6 core by using the MMX code on the K6-2/K6-3 and
although it could be made to work for a single "cycle" of the RC5 algorithm,
it seemed to have extra clocks added when the instruction sequence did not
align on a 32 byte boundary. This made it near impossible (I stopped trying)
to extend to the 26 cycles by 3 rounds required.

FWIW the keys/s/MHz for the x86 cores are available at

There may be some advantage to mixing MMX code with the integer code for the
K7 core.  Basically this uses spare decode cycles (where we can't find 3
instructions to run in parallel due to the paucity of registers) to run MMX
code which does part of the first round of key expansion for the next pair
of keys while the current pair are being processed by the integer code.

Bruce Ford

To unsubscribe, send 'unsubscribe rc5' to majordomo at
rc5-digest subscribers replace rc5 with rc5-digest

More information about the rc5 mailing list