[Hardware] The market of ASICs (One GigaKey / Second?)
jbass at dmsd.com
jbass at dmsd.com
Sat Aug 7 20:26:33 EDT 2004
"Dan Oetting" <dan_oetting at uswest.net> writes:
> All the unrolling and optimizations are simply mechanical operations
> that tend to obfuscate. It sounded like the original poster wanted to
> know what the algorithm was so he could wrap his head around it. The
> optimized source code apparently only confused him.
I assumed he also looked at the RC5 spec, and was hoping to find
optimized code a better starting point.
> But have you got the optimum design considering time, space, power and
> maximum clock rate? The FIFOs for the S terms could be eliminated by
> recomputing the previous round in parallel with the current round.
> Instead of two 26 stage FIFOs you would add 12 adders and 3 barrel
> rolls.
Fifo's seem much cheaper than duplicate adders and barrel shifters.
Ok, clue me in here. I come up with a LOT more. How do you plan to
recomputer the S and L terms for 52 stages with only 12 adders
and 3 barrel rolls? I figured at min 3 adders (I=A+B, S+I, L+I) and
two barrel rolls (S and L) per stage to recompute stage 1 S and L
terms. Recomputing Stage 2 S and L terms is much more interesting.
Plus regen of the key values.
Maybe a quick ascii block chart for a round 2 and round 3 stage?
I was playing with it sometime back in Xilinx FPGA's. The fifo's
were relatively cheap as two 16 bit LUT shift registers cascaded
per bit with a short latency ... so 64 LUT's per S term, a fraction
of the LUT's for a 32 bit barrel shifter, and about the same as
a three term 32 bit adder. So at least with FPGA's shifters are cheap.
With VLSI a 26 bit serial shifter is dead cheap.
> Another alternative, instead of 32 bits wide go 1 bit serial. The
> complexity of the fast adder with carry lookahead almost vanishes. The
> barrel rolls are not much more than a 32 bit fifo. And while the barrel
> rolls delay the processing of successive stages, another key can be
> processed on the same circuitry in the gaps, so the entire pipeline
> will process 1 key every 32 clocks. With the vastly simpler logic much
> higher clock speeds can be obtained. And more parallel pipes can be
> built in the same space.
Hmm ... I'd like to see a rough prototype algorithm for that design.
Thanks,
John
More information about the Hardware
mailing list