[RC5] CUDA clients need recompiling for newer GPUs

Greg Childers jgchilders at mailaps.org
Sun Jan 19 17:36:46 EST 2014


Hi,

I did further testing on a Fermi GPU (GTX 480), and it turns out that on
this gpu, the CC 1.0 optimized code is faster than CC 2.0.  Also, it's
better to include the PTX code so it remains compatible with future GPUs.
 So when compiling use the command

NVCC = /usr/local/cuda/bin/nvcc --generate-code
arch=compute_10,code=compute_10 --generate-code
arch=compute_30,code=compute_30 --generate-code
arch=compute_35,code=compute_35

I don't have access to a CC 3.0 GPU, so I don't know if the binary will be
faster with or without the compute_30 section above.

Greg



On Sat, Jan 18, 2014 at 4:16 AM, Mike Reed <mikereedwt at gmail.com> wrote:

> Hi,
>
> This is excellent. :) Watch this space...
>
> Kind regards,
> Mike
>
> On 18 January 2014 10:53, Greg Childers <jgchilders at mailaps.org> wrote:
> > Hi,
> >
> > The newer CUDA sm_35 GPUs such as the GTX Titan and Tesla K20/K40 have a
> > funnel shift instruction, which can be used as a 32-bit rotate.  No code
> > changes are necessary.  The compiler recognizes the two shifts and or as
> a
> > rotate, and does the right thing.  Here are the results of my testing on
> a
> > Tesla K20.
> >
> > Current client:
> >
> > dnetc v2.9108-517-CTR-10070312 for CUDA 3.1 on Linux (Linux
> 2.6.32-279.14.1
> > ...
> >
> > Please provide the *entire* version descriptor when submitting bug
> reports.
> >
> > The distributed.net bug report pages are at http://bugs.distributed.net/
> >
> >
> > [Jan 18 10:47:30 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
> >
> > [Jan 18 10:47:37 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd)
> >
> >                       0.00:00:04.63 [937,151,615 keys/sec]
> >
> > [Jan 18 10:47:37 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd).
> >
> > [Jan 18 10:47:44 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd)
> >
> >                       0.00:00:04.44 [976,631,543 keys/sec]
> >
> > [Jan 18 10:47:44 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd).
> >
> > [Jan 18 10:47:50 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd)
> >
> >                       0.00:00:04.42 [981,795,666 keys/sec]
> >
> > [Jan 18 10:47:50 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd).
> >
> > [Jan 18 10:47:57 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd)
> >
> >                       0.00:00:04.74 [914,549,002 keys/sec]
> >
> > [Jan 18 10:47:57 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).
> >
> > [Jan 18 10:48:04 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)
> >
> >                       0.00:00:04.58 [948,142,344 keys/sec]
> >
> > [Jan 18 10:48:04 UTC] RC5-72: using core #5 (CUDA 2-pipe 256-thd).
> >
> > [Jan 18 10:48:10 UTC] RC5-72: Benchmark for core #5 (CUDA 2-pipe 256-thd)
> >
> >                       0.00:00:04.40 [985,609,646 keys/sec]
> >
> > [Jan 18 10:48:10 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd).
> >
> > [Jan 18 10:48:17 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd)
> >
> >                       0.00:00:04.51 [962,502,541 keys/sec]
> >
> > [Jan 18 10:48:17 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd).
> >
> > [Jan 18 10:48:23 UTC] RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd)
> >
> >                       0.00:00:04.40 [986,996,612 keys/sec]
> >
> > [Jan 18 10:48:23 UTC] RC5-72: using core #8 (CUDA 4-pipe 256-thd).
> >
> > [Jan 18 10:48:30 UTC] RC5-72: Benchmark for core #8 (CUDA 4-pipe 256-thd)
> >
> >                       0.00:00:04.36 [995,250,114 keys/sec]
> >
> > [Jan 18 10:48:30 UTC] RC5-72: using core #9 (CUDA 1-pipe 64-thd busy
> wait).
> >
> > [Jan 18 10:48:36 UTC] RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd
> busy
> > wait)
> >
> >                       0.00:00:04.53 [957,125,645 keys/sec]
> >
> > [Jan 18 10:48:36 UTC] RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep
> > 100us).
> >
> > [Jan 18 10:48:43 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd
> > sleep 100us)
> >
> >                       0.00:00:04.58 [947,487,144 keys/sec]
> >
> > [Jan 18 10:48:43 UTC] RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep
> > dynamic).
> >
> > [Jan 18 10:48:50 UTC] RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd
> > sleep dynamic)
> >
> >                       0.00:00:04.57 [949,532,041 keys/sec]
> >
> > [Jan 18 10:48:50 UTC] RC5-72 benchmark summary :
> >
> >                       Default core : #0 (CUDA 1-pipe 64-thd)
> >
> >                       Fastest core : #8 (CUDA 4-pipe 256-thd)
> >
> >
> > Recompiled client:
> >
> > dnetc v2.9110-519-CTR-11072023 for CUDA on Linux (Linux
> 2.6.32-279.14.1.el6
> > ...
> >
> > Please provide the *entire* version descriptor when submitting bug
> reports.
> >
> > The distributed.net bug report pages are at http://bugs.distributed.net/
> >
> >
> > [Jan 18 10:50:19 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
> >
> > [Jan 18 10:50:24 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd)
> >
> >                       0.00:00:03.19 [1,367,772,924 keys/sec]
> >
> > [Jan 18 10:50:24 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd).
> >
> > [Jan 18 10:50:29 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd)
> >
> >                       0.00:00:03.12 [1,395,317,272 keys/sec]
> >
> > [Jan 18 10:50:29 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd).
> >
> > [Jan 18 10:50:35 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd)
> >
> >                       0.00:00:03.09 [1,409,888,624 keys/sec]
> >
> > [Jan 18 10:50:35 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd).
> >
> > [Jan 18 10:50:40 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd)
> >
> >                       0.00:00:03.15 [1,383,760,581 keys/sec]
> >
> > [Jan 18 10:50:40 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).
> >
> > [Jan 18 10:50:45 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)
> >
> >                       0.00:00:03.04 [1,435,289,273 keys/sec]
> >
> > [Jan 18 10:50:45 UTC] RC5-72: using core #5 (CUDA 2-pipe 256-thd).
> >
> > [Jan 18 10:50:50 UTC] RC5-72: Benchmark for core #5 (CUDA 2-pipe 256-thd)
> >
> >                       0.00:00:03.00 [1,454,768,816 keys/sec]
> >
> > [Jan 18 10:50:50 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd).
> >
> > [Jan 18 10:50:55 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd)
> >
> >                       0.00:00:03.11 [1,402,262,966 keys/sec]
> >
> > [Jan 18 10:50:55 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd).
> >
> > [Jan 18 10:51:00 UTC] RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd)
> >
> >                       0.00:00:03.08 [1,416,234,138 keys/sec]
> >
> > [Jan 18 10:51:00 UTC] RC5-72: using core #8 (CUDA 4-pipe 256-thd).
> >
> > [Jan 18 10:51:05 UTC] RC5-72: Benchmark for core #8 (CUDA 4-pipe 256-thd)
> >
> >                       0.00:00:03.05 [1,429,372,665 keys/sec]
> >
> > [Jan 18 10:51:05 UTC] RC5-72: using core #9 (CUDA 1-pipe 64-thd busy
> wait).
> >
> > [Jan 18 10:51:11 UTC] RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd
> busy
> > wait)
> >
> >                       0.00:00:03.14 [1,386,448,528 keys/sec]
> >
> > [Jan 18 10:51:11 UTC] RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep
> > 100us).
> >
> > [Jan 18 10:51:16 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd
> > sleep 100us)
> >
> >                       0.00:00:03.29 [1,324,972,176 keys/sec]
> >
> > [Jan 18 10:51:16 UTC] RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep
> > dynamic).
> >
> > [Jan 18 10:51:21 UTC] RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd
> > sleep dynamic)
> >
> >                       0.00:00:03.18 [1,371,027,270 keys/sec]
> >
> > [Jan 18 10:51:21 UTC] RC5-72 benchmark summary :
> >
> >                       Default core : #0 (CUDA 1-pipe 64-thd)
> >
> >                       Fastest core : #5 (CUDA 2-pipe 256-thd)
> >
> >
> > You can easily compile for all current architectures by removing the ptx
> and
> > cubin lines, and compiling with
> >
> > NVCC = /usr/local/cuda/bin/nvcc --generate-code
> arch=compute_10,code=sm_10
> > --generate-code arch=compute_20,code=sm_20 --generate-code
> > arch=compute_30,code=sm_30 --generate-code arch=compute_35,code=sm_35
> >
> > Greg
> >
> >
> > _______________________________________________
> > rc5 mailing list
> > rc5 at lists.distributed.net
> > http://lists.distributed.net/mailman/listinfo/rc5
> >
> _______________________________________________
> rc5 mailing list
> rc5 at lists.distributed.net
> http://lists.distributed.net/mailman/listinfo/rc5
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.distributed.net/pipermail/rc5/attachments/20140119/db6986cc/attachment-0001.html>


More information about the rc5 mailing list