[HARDWARE] Just Courious......]]
oetting at gldmutt.cr.usgs.gov
Mon Sep 20 15:43:26 EDT 1999
At 12:42 -0700 9/20/1999, stoney at sequent.com wrote:
>You will only be able to partially overlap the assignments to A and B
>in the for loops because of the data dependencies. You cannot feasibly
>unroll the decryption loop
Do you know what you are talking about? I fully unrolled the loops in the
PowerPC cores and eliminated the A and B assignments since they are
redundant when you have 32 registers to work with. You are confusing
unrolling loops with parallel execution in a super scalar architecture.
Although there is little overlap with 1 key its possible to fold the loop
and completely overlap the first and last rounds requiring only 2
additional registers. And anybody that has worked with this code knows that
you encrypt instead of decrypt because the encryption can be folded into
the last round of key generation and you don't need to generate the final
S inside the loop.
Dan Oetting <dan_oetting at comug.com>
PowerPC 603/604/750 -- Still the fastest core on the net.
To unsubscribe, send 'unsubscribe hardware' to majordomo at lists.distributed.net
More information about the Hardware