[Hardware] "success"

John L. Bass jbass at dmsd.com
Sat Oct 21 02:51:24 EDT 2006


"Dan Oetting" <dan_oetting at qwest.net> on Fri, 20 Oct 2006 22:08:03 -0600 writes:
> You either need to compute the L[ ] values to feed into the first  
> round of each key schedule stage or save the S[ ] values for each  
> iteration between stages. You could generate the L[ ] values by  
> running your 3 stages through 3 passes with the same key to generate  
> and pass the required values. Alternatively, you could replicate the  
> early key schedule stages and feed them with the next 2 keys to be  
> processed. You would then have a total of 6 key schedule stages and 1  
> decrypt stage but only need 1 pass per key and no S[ ] storage. I  
> figure that's about a 40% savings.

>From an FPGA/VLSI perspective, I don't see how this is a 40% savings for
a fully unrolled solution.

A "stage" in an FPGA is something around 5*32 4-LUT's, and SBox storage
is 2*32 4-LUT's. The round 2 SBox propagating to Round 3, would take about
2080 4-LUTs to replace 832 4-LUT's of LUT Rams for the SBox, almost 250%
more expensive. Worse numbers for the round 3 to round 4 SBox propagation,
as you need about 4160 4-LUT's to regenerate the round 3 sbox terms, and
only 832 to store them.

The "trick" works for a processor solution when the storage is more expensive
than the cycles, such as a small microprocessor. It's expensive for nearly
every other case.

However, it could be cheaper for a fully looped design, just as it is for
small prcoessors. And might be useful in running many small looped engines
in the FPGA, rather than one large unrolled engine.


More information about the Hardware mailing list