[Hardware] The market of ASICs (One GigaKey / Second?)
dan_oetting at uswest.net
Sun Aug 8 02:41:19 EDT 2004
On Aug 7, 2004, at 6:26 PM, jbass at dmsd.com wrote:
>> But have you got the optimum design considering time, space, power and
>> maximum clock rate? The FIFOs for the S terms could be eliminated by
>> recomputing the previous round in parallel with the current round.
>> Instead of two 26 stage FIFOs you would add 12 adders and 3 barrel
> Fifo's seem much cheaper than duplicate adders and barrel shifters.
> Ok, clue me in here. I come up with a LOT more. How do you plan to
> recomputer the S and L terms for 52 stages with only 12 adders
> and 3 barrel rolls? I figured at min 3 adders (I=A+B, S+I, L+I) and
> two barrel rolls (S and L) per stage to recompute stage 1 S and L
> terms. Recomputing Stage 2 S and L terms is much more interesting.
> Plus regen of the key values.
> Maybe a quick ascii block chart for a round 2 and round 3 stage?
Here's the overview. S is passed to the right, A, B and L are
passed down. Each Round is a vertical stack of 26 stages. Each stage is
1 itteration consisting of 4 adds, 1 fixed rotate and 1 variable
Key -> Round1
Key+26 -> Round1 -> Round2 P0/P1
| | |
V V V
Key+52 -> Round1 -> Round2 -> Round3 -> Encrypt
Each stage in round 2 needs 1 repeated Round 1 stage to provided the
delayed S values. The Round3 stages need the delayed Round 2 stage
which in turn needs a twice delayed Round1 stage. The cost for each
stage of the 3 extra Rounds is 12 adds and 3 variable rotates (the
fixed rotates are free in hardware). This is compared to the cost of 2
FIFOs 26 stages long. If a barrel roll is equivalent to 5 FIFO stages
and an Add is equivalent to 2 FIFO stages that would be a total
equivalence of 39 verses 52. The actual costs may be higher in the
> I was playing with it sometime back in Xilinx FPGA's. The fifo's
> were relatively cheap as two 16 bit LUT shift registers cascaded
> per bit with a short latency ... so 64 LUT's per S term, a fraction
> of the LUT's for a 32 bit barrel shifter, and about the same as
> a three term 32 bit adder. So at least with FPGA's shifters are cheap.
> With VLSI a 26 bit serial shifter is dead cheap.
>> Another alternative, instead of 32 bits wide go 1 bit serial. The
>> complexity of the fast adder with carry lookahead almost vanishes. The
>> barrel rolls are not much more than a 32 bit fifo. And while the
>> rolls delay the processing of successive stages, another key can be
>> processed on the same circuitry in the gaps, so the entire pipeline
>> will process 1 key every 32 clocks. With the vastly simpler logic much
>> higher clock speeds can be obtained. And more parallel pipes can be
>> built in the same space.
> Hmm ... I'd like to see a rough prototype algorithm for that design.
The overall block diagram looks much like the parallel version. The
details for each iteration stage will look considerably different. The
adders are just a handful of gates and implement carry as a 1 bit time
delay feeding back into the adder. Rotates require a 32 bit time delay
and up to 64 bits of storage and 10 selector gates for the variable
rotate. To compensate for the rotate delays all the other terms carried
through the stage also need to be delayed. Each stage will contain
about 320 bits of fifo.
The delay between rounds for the S values would need to be 1664 bits
(64*26). So the two delays totaling 3228 bits can be replaced by
regenerating 3 rounds as shown above using 320 bits totaling only 960
The actual encryption add about 160 fifo bits per stage. The S0
constants can be generated inline with a 1 bit adder per stage which is
probably cheaper than trying to clock the constants in from a 32 bit
register. The whole shebang will be around 54K fifo bits plus a few
logic gates. Clock rates of over 4 gig could probably be expected
(assuming the heat can be dissipated).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 4427 bytes
Desc: not available
Url : http://lists.distributed.net/pipermail/hardware/attachments/20040808/bc5b9317/attachment.bin
More information about the Hardware