[RC5] Cuda client
Décio Luiz Gazzoni Filho
decio at decpp.net
Thu Jan 31 13:18:03 EST 2013
On Jan 31, 2013, at 3:52 PM, Joseph Kaye <jkaye at isd.net> wrote:
> On 1/30/2013 2:18 PM, bert wrote:
>> The Cuda client is still at 3.1 and the with the latest Nvidia
>> driver(310.90) using version 5,could this be the reason my son's GTX 680
>> is slower than his old GTX 580(which is in use on my other son's
>> computer)? I was kind of shocked and was expecting a much quicker pace
>> in going through 64 stats units(the GTX 580 is averaging about 3 minutes
>> or so vs almost 8 minutes with the GTX 680!) Any help or suggestions
>> would be appreciated.
>> rc5 mailing list
>> rc5 at lists.distributed.net
> It doesn't look like the 6xx series is working so well for dnetc.
> Apparently they changed the design of the GPU, and while they do have
> more cores now, they removed instructions (or something) that DNETC
> needs for faster computations. IIRC, it's kinda like when Intel removed
> the ROT function from the P4 CPU's. They had a higher clock speed, but
> because of the ROTate function was not there, the keyrate was much slower.
Just to clear up a common misconception: Intel never *removed* an instruction from a new processor that was present in an earlier processor, including rol. That would break up backward compatibility which is a main selling point of the x86 architecture (and by extension the Wintel platform).
What did happen is that there is a piece of hardware used for efficient (usually single-cycle) implementation of variable-sized shifts and rotations (the shl, shr, sar, rol, ror, rcl and rcr instructions) -- that hardware is called a barrel shifter. it has historically been implemented on every Intel processor since the 80386 or so, but don't quote me on that. Certainly the classic 1993-era Pentium did have it. The barrel shifter is what wasn't present on the Pentium 4, and the reason why the rol instruction executed slower (I believe it took 4 cycles).
So the instruction has always existed, even on the Pentium 4 -- it had to because of backward compatibility reasons -- but the hardware for efficiently implementing it didn't exist only on the P4 chips, and Intel has added it back on the newer Core chips, which is why they perform better.
More information about the rc5