[Hardware] Updated Bit Serial design notes (corrections)

jbass at dmsd.com jbass at dmsd.com
Tue Sep 14 19:10:27 EDT 2004


1bit Serial Design notes (updated with corrected assumptions):

SBOX implemented as cascaded 26 bit LUT shifters would require
(32*26)/16=52 LUTs. The S function requires 2 serial adder stages,
plus an F5 MUX to implement the ROTL3 function for XC2V devices.
XCV devices require one additional LUT.

The L function also requires two 2 bit serial adder stages, plus
6 LUT's to hold the L terms, which can also handle the ROTL3 function
using XC2V devices. XCV devices require one more LUT. 5 DFFs are
also required, which can be taken from the LUTs. One additional
LUT is required for XCV devices.

The E function requires 4 LUT's to handle the E terms, which can
also handle the ROTL function using XC2V devices. XCV devices require
one more LUT. 5 DFF's are also required, which can be taken from
the LUT's. One additional LUT is required for XCV devices.

To hide the latency of the rotate functions, it's necessary to
interleave two or three key operations concurrently, duplicating
the SBox and L terms. By offseting the phase of the keys by 26 stages,
the E terms can be shared.

Thus a 1 bit serial RC5 engine is 52+52+2+2+6+6+4=124 LUTs for
XC2V devices, and 127 LUT's for XCV devices, if shared over two keys
with an average of 62 or 63.5 LUTs per key. If shared over three
keys, 52+52+52+2+2+6+6+6+4=182 LUTs are required for XC2V devices
and 185 LUTs for XCV devices for an average of 60.7 and 61.7 LUTs
per key.  Each are processed with an effective cycle time of 2496
clocks per key.

Performance is limited by the cascaded LUT shifters, to a clock
cycle time of between 3-7ns depending on device and speed grade.
Ball park numbers, +/- 30%, for performance are then:

Device            LUTs  RC5s       Keys/Sec     Blocks/Day

XCV50             1536    24      1,923,077        39
XCV100            2400    38      3,044,872        61
XCV200            4704    76      6,089,744       123
XCV300            6144    99      7,932,692       160
XCV400            9600   155     12,419,872       250
XCV600           13824   224     17,948,718       361
XCV800           18816   304     24,358,974       490
XCV1000          24576   398     31,891,026       642
XCV1600          31104   504     40,384,615       812

XC2V40             512     8        801,282        16
XC2V80            1024    16      1,602,564        32
XC2V250           3072    50      5,008,013       101
XC2V500           6144   101     10,116,186       204
XC2V1000         10240   168     16,826,923       338
XC2V1500         15360   253     25,340,545       510
XC2V2000         21504   354     35,456,731       713
XC2V3000         28672   472     47,275,641       951
XC2V4000         46080   759     76,021,635     1,529
XC2V6000         67584  1113    111,478,365     2,243
XC2V8000         93184  1535    153,745,994     3,093
XC2V10000       122880  2024    202,724,359     4,078

This appears to be a factor of 3 off parallel design performance
estimates, due to the lower function density caused by the fixed
cost of each SBox.

An ASIC design probably would fair much better, as the relative cost
of the shifters would be much lower.

There might be a sweet spot using a digit serial design, which offsets
the SBox overhead against clock rates better.

John Bass


More information about the Hardware mailing list