[RC5] performance

Décio Luiz Gazzoni Filho decio at revistapcs.com.br
Thu Sep 18 10:47:30 EDT 2003

The problem is that you're trying to apply simple clock scaling to compute the supposed throughput of your new machine, which won't work in this case given that they're not using the same processor model -- linearly scaling the clock only works inside the same processor family.

As for the P4 performing so poorly, realize that it is a completely overhauled core, and while Intel actually raised the efficiency of the architecture in a few cases (i.e. L1 cache with 2 cycle load-use latency, quad-pumped FSB, trace cache, double-pumped ALUs, new vector instructions, etc.), quite a few instructions had their cycle counts increased, one of the design tradeoffs required for the P4 to reach such high clockrates. In particular, Intel had to strip a piece of hardware called the barrel shifter from the P4's design, which is responsible for bit shifts and rotates, thus rendering the P4 very slow at these operations (4 cycles vs. 1 cycle in every other x86 design out there). It so happens that these bit rotates are at the heart of the RC5 algorithm, being executed hundreds of times for each key checked, and the poor cycle count for this instruction on the P4 renders it so slow at this algorithm.

The good news is that the current core is nowhere near optimized for the P4, someone has already contributed a 30% faster core recently and surely this is not the end of the line. Intel is also releasing a design refresh for the P4 next month, called Prescott (commercial name should be Pentium 5) and it'll include many architectural improvements, one of which could be a barrel shifter -- we can hope so!

Also, please do not disable HT on your machine. Just set dnetc to run 2 threads instead of 4, and if you have a fairly new operating system, it'll be smart enough to schedule these threads on the two physical CPUs instead of the same CPU. Disabling HT will just lead to lost performance on other applications and no gains for dnetc.


On Wed, 17 Sep 2003 23:25:28 -0700, chandler sobel-sörenson <scar at fugazi.engr.arizona.edu> wrote:

> De: chandler sobel-sörenson	<scar at fugazi.engr.arizona.edu>
> Data: Wed, 17 Sep 2003 23:25:28 -0700
> Para: "D.net Discussion" <rc5 at lists.distributed.net>
> Assunto: Re: [RC5] performance
> a friend of mine with dual p3 1 GHz is getting around 4.2 Mkeys/s, which is
> the initial reason i would expect a lot more than 7.2 Mkeys/s with my dual
> xeon 2.8 GHz.  what do you guys think of that?

More information about the rc5 mailing list