[RC5] oddball DES vs. CPU speeds

Remi Guyomarch rguyom at mail.dotcom.fr
Mon Jan 19 21:39:16 EST 1998


Hi Svend,

> In my key search routine a key is tested in about 400 instructions.
> Out of this the code for round 3 to 11 takes 297 instructions. The
> code for these rounds is simple, easy to change and easy to test. It
> is basically 33 instructions (could be fewer) repeated 9 times.
>
> If the code should be optimized for different x86 processors, I guess
> this is the only code I would touch.
> 
> If you want I could make a version which call external procedures for
> round 3 to 11 and convert it to aout format.

Yes, it would be a very good idea. But it could slow down your code
too... If the critical part is in one "block", ie isn't repeated
multiple times in various places in the code, we can use jumps to go
to/from the processor specific part.

> While I am here:
> 
> The bswap instruction which is illegal on 386 was removed in the
> January 10 library version, but I understand that one would not change
> library version just at the moment for release.

:-) I think we simply haven't noticed the change.
I've just looked at the 1.01 version. It seems to perform significantly
better on a Pentium Pro. On a 200 Mhz PPro, the 1997 version was running
at ~850 kkeys/s, the 1.01 version gets now 920 kkey/s (tested under NT4,
with the distributed.net client). I've checked also the low-memory
version on my personnal DX4/100 under Linux and it runs at ~219 kkeys/s
instead of ~231 kkeys/s for the full version. But I haven't benchmarked
it under Win95...

> And then the Windows 95 speed puzzles me. On my 120 MHz Pentium the
> client runs at above 600 K key/second in Windows NT, but only at about
> 550 K keys/second in Windows 95. I don't know the current plans, but I
> could consider looking into this. One possible solution could be to
> make a version, which has a slightly lower maximum speed, but runs in
> less memory and might be less sensitive to conditions in Windows 95.

There could be a significant difference between a 'standard' Pentium
with 2x16 KB cache versus a MMX enhanced Pentium with 2x32 KB cache.
Have you an estimate of the code+data space your core is using in its
internal loop ?

-- 
RÈmi		Don't waste your computer's time. Distribute it!
			http://www.distributed.net/
	    RC5 cores source code : http://wwwperso.hol.fr/~guyom001/


--
To unsubcribe, send 'unsubscribe rc5' to majordomo at lists.distributed.net
rc5-digest subscribers replace rc5 with rc5-digest



More information about the rc5 mailing list