> the AMD K6-2 with 350 Mhz produces only 550 KiloKeys / s. You say, that
> all the code of the d.net client is integer code. I thought AMD is in
> Integer code as fast or faster than Intel.
> Or it is just normal, when his AMD is so slow?

Normal.  Clock count on the K6/K6-2 is ~610 clocks per key.

It is mainly due to the vector decode of the ROL instruction; ROL 
being very important in the RC5 algorithm.

Presumably a ROL reg,cl becomes:

alu  mov temp1,reg
alu  mov temp2,ecx

alux shl reg,cl
alu  neg temp2

alux shr temp1,temp2

alu  or reg,temp1

ie. It takes 4 clocks though there are pairing opportunities in the 
last two.
(<RANT> The above is all guess work.  Wish AMD would just tell 
us what the vector decodes are and whether they all end up in the 
X pipe or just the shifts. </RANT>)

On Intel PPro/PII/PIII the ROL instruction is 1 clock pairable.

There is some hope that on the K6-2 the clock count can be 
brought down to similar to a P5-MMX (464.5 clocks/key) using a 
variant of the RC5-MMX code.  However a recent contest between 
two programmers was unable to do so.

Feel free to play with the cores available at 

