[RC5] New Intel compilers...

Slawek sgp at telsatgp.com.pl
Mon Dec 9 20:28:21 EST 2002


> >  S> In my tests Intel Compiler generated about faster core for single
pipe
> >  S> that any of the assembler versions included (almost as fast as SES
> >  S> 2-pipe).
> > For which CPU ??
>
> tested on dual P III 1 GHz


Following to myself...

Just in case somebody wanted "the real numbers" ;)


>From original client:

[Dec 09 18:52:32 UTC] Automatic processor type detection found
                      an Intel Pentium III processor.
[Dec 09 18:52:32 UTC] RC5-72: using core #0 (ANSI 4-pipe).
[Dec 09 18:52:51 UTC] RC5-72: Benchmark for core #0 (ANSI 4-pipe)
                      0.00:00:16.93 [679,355 keys/sec]
[Dec 09 18:52:51 UTC] RC5-72: using core #1 (ANSI 2-pipe).
[Dec 09 18:53:10 UTC] RC5-72: Benchmark for core #1 (ANSI 2-pipe)
                      0.00:00:16.32 [843,792 keys/sec]
[Dec 09 18:53:10 UTC] RC5-72: using core #2 (ANSI 1-pipe).
[Dec 09 18:53:30 UTC] RC5-72: Benchmark for core #2 (ANSI 1-pipe)
                      0.00:00:17.34 [603,084 keys/sec]
[Dec 09 18:53:30 UTC] RC5-72: using core #3 (SES 1-pipe).
[Dec 09 18:53:49 UTC] RC5-72: Benchmark for core #3 (SES 1-pipe)
                      0.00:00:17.42 [1,860,143 keys/sec]
[Dec 09 18:53:49 UTC] RC5-72: using core #4 (SES 2-pipe).
[Dec 09 18:54:08 UTC] RC5-72: Benchmark for core #4 (SES 2-pipe)
                      0.00:00:16.48 [2,123,302 keys/sec]
[Dec 09 18:54:08 UTC] RC5-72: using core #5 (DG 2-pipe).
[Dec 09 18:54:28 UTC] RC5-72: Benchmark for core #5 (DG 2-pipe)
                      0.00:00:17.35 [1,740,662 keys/sec]
[Dec 09 18:54:28 UTC] RC5-72: using core #6 (DG 3-pipe).
[Dec 09 18:54:48 UTC] RC5-72: Benchmark for core #6 (DG 3-pipe)
                      0.00:00:17.42 [1,861,825 keys/sec]
[Dec 09 18:54:48 UTC] RC5-72: using core #7 (DG 3-pipe alt).
[Dec 09 18:55:07 UTC] RC5-72: Benchmark for core #7 (DG 3-pipe alt)
                      0.00:00:16.70 [1,810,611 keys/sec]


Core compiled with Intel Compiler (the only change is macro for "ROTL", I
directed it to "_rotl(x,s)" which gives better code):

[Dec 09 19:09:17 UTC] Automatic processor type detection found
                      an Intel Pentium III processor.
[Dec 09 19:09:17 UTC] RC5-72: using core #2 (ANSI 1-pipe).
[Dec 09 19:09:36 UTC] RC5-72: Benchmark for core #2 (ANSI 1-pipe)
                      0.00:00:16.68 [1,935,685 keys/sec]


Note that I'm testing on dual processor system but I have *disabled* hyper
threading in Intel Compiler to avoid using two processors during benchmark.


I don't know why, but "dual pipe ANSI" gives worse results...

[Dec 09 19:10:24 UTC] Automatic processor type detection found
                      an Intel Pentium III processor.
[Dec 09 19:10:24 UTC] RC5-72: using core #1 (ANSI 2-pipe).
[Dec 09 19:10:43 UTC] RC5-72: Benchmark for core #1 (ANSI 2-pipe)
                      0.00:00:16.17 [1,855,950 keys/sec]

As you can see SES-2 is still the best, but.... Intel Compiled code can be
analised by hand to find out what's happening.


If function is using parameters inside of the code Intel Compiler always
uses EBP or EBX to reach those parameters, probably better can be something
like:
push ebp
mov ebp,esp
add esp,-stackframe
and esp,-64    ; stack align
mov eax,[param1]
mov ebx,[param2]
mov ecx,[param3]
mov [esp+aaa],ebp
mov [esp+xxx],eax
mov [esp+yyy],ebx
mov [esp+xxx],ecx
; all 7 registers: eax,ebx,ecx,esi,edi,ebp are now free, stack frame can be
addressed using esp
........
mov ebp,[esp+aaa]
mov esp,ebp
pop ebp
retn

Obviously it is possible to copy block which address is given as parameter,
to the stack frame.


This could speed it a little bit, especially that Intel Compiler is using 3
registers + ECX in 1-pipe version so having 6 registers + ECX may allow
similar speed increase in 2-pipes version. Even not using MMX and SSE2 (on
P4) which can be working at the same time that main processing unit....

Unfortunatelly I don't have time to play with that.


--
Slawek Piotrowski

--
To unsubscribe, send 'unsubscribe rc5' to majordomo at lists.distributed.net
rc5-digest subscribers replace rc5 with rc5-digest



More information about the rc5 mailing list