[RC5] Athlon core
b.ford at qut.edu.au
Thu Sep 30 18:02:21 EDT 1999
WARNING: Highly technical detail follows
As Dan Oetting has thrown down the challenge of the fastest RC5
core, I would like to investigate a faster core for the AMD Athlon/K7
The PPC core currently takes about 304 clocks per key. I believe
that the Athlon could take ~250 clocks per key by using a mix of
the x86 integer and MMX code.
My reasoning is this.
The PPro/PII/PIII core takes ~340 clocks per key. The MMX core
takes ~460 clocks/key.
The Athlon can decode 3 DirectPath instuctions per clock and can
issue 3 ALU, 3 Address Generation and 3 MMX instuctions per
Due to the limitations of the x86 register set it is difficult to use the
third ALU unit effectively. However the MMX units can be used as
unlike the PII/PIII/K6-2/K6-3 they do not use the same issue slots
as the ALUs.
Without going into detail, there are 4 stages to the RC5 algorithm;
3 rounds of key expansion and a round where the plaintext is
mixed with the expanded key. My proposal is to have the MMX
units process part of the first round of the key expansion of the
next key pair while the ALUs do the rest on the current key pair.
The current MMX code processes 4 keys at a time giving 3680
paired instructions per loop. The integer code does 2 keys in
parallel giving 1360 paired instuctions per loop. (I know this is not
exactly right and I know I could go off and count them but for now
just let it ride please.)
This has to be combined into a number of instruction triples, 2
integer and 1 MMX. Note that our limitation here is that the
decoders can only process 3 instructions per clock.
In a loop processing 2 keys containing n triples the fraction we
need to complete in the integer code is n/680 and the fraction in
the MMX code is n/1840.
So n/680+n/1840=1 => n=496
Since this is for 2 keys thats 248 clocks per key.
Now why am I telling you all this and not going and doing it myself:
1. I don't have access to an Athlon processor.
2. My time for assembler programming is very limited right now.
3. Someone else might use these ideas or work with me to bring
them to fruition.
4. Someone may already be working on an Athlon core and have
better ideas or prefer not to reinvent wheels.
Comments and corrections welcome.
Bruce Ford b.ford at qut.edu.au
Teaching and Learning Support Services Ph: +61 7 3864 1178
Queensland University of Technology
To unsubscribe, send 'unsubscribe rc5' to majordomo at lists.distributed.net
rc5-digest subscribers replace rc5 with rc5-digest
More information about the rc5