[RC5] RC5-72 Pentium 4 optimizations

Slawek sgp at telsatgp.com.pl
Mon Sep 8 00:22:56 EDT 2003


Hi list!



Have somebody tried optimizing RC5 cores for Pentium 4?


I've made some checking on my Pentium 4 2.6 GHz
with Hyper Threading and the fastest core available
in official d.net client is "DG 3-pipe".

I have wasted some time ;) optimizing it and those are
the results:

-- start --
dnetc v2.9003-481-CTR-03030112 for Win32 (WindowsNT 5.1).
Using email address (distributed.net ID) 'xxx'

[Sep 07 20:57:55 UTC] Automatic processor type detection found
                      an Intel Pentium 4 (Northwood) processor.
[Sep 07 20:57:55 UTC] RC5-72: using core #9 (Deen 3-pipe).
[Sep 07 20:57:55 UTC] RC5-72: 32/32 Tests Passed (0.421875 seconds)

dnetc v2.9003-481-CTR-03030112 for Win32 (WindowsNT 5.1).
Using email address (distributed.net ID) 'xxx'

[Sep 07 20:57:56 UTC] Automatic processor type detection found
                      an Intel Pentium 4 (Northwood) processor.
[Sep 07 20:57:56 UTC] RC5-72: using core #9 (Deen 3-pipe).
[Sep 07 20:58:15 UTC] RC5-72: Benchmark for core #9 (Deen 3-pipe)
                      0.00:00:16.45 [4,100,031 keys/sec]

dnetc v2.9003-481-CTR-03030112 for Win32 (WindowsNT 5.1).
Using email address (distributed.net ID) 'xxx'

[Sep 07 20:59:31 UTC] Automatic processor type detection found
                      an Intel Pentium 4 (Northwood) processor.
[Sep 07 20:59:31 UTC] RC5-72: using core #6 (DG 3-pipe).
[Sep 07 20:59:50 UTC] RC5-72: Benchmark for core #6 (DG 3-pipe)
                      0.00:00:16.51 [3,546,809 keys/sec]
-- end --

Currently I cannot donate much processor time
to the project, but I'd like to donate at least the core ;)


Who should I send it to?


I think I should reach 4,2 Mkeys/sec by optimizing also
the code outside intermost loop, but I don't have more
time at the moment.


I have not taken my chance in using MMX / SSE2 opcodes
in my core so far.

MMX is surely slower b'cause it doesn't have ROL
and even (SHL & SHR & OR) triple doesn't work
with multiple data sets.

Although it is possible to do ALU & MMX at the same time
so there is some potential :)


Oh, and by the way - why those cores work so much slower
with hyperthreading enabled?



-- 
Slawek Piotrowski




More information about the rc5 mailing list