[HARDWARE] Just Courious......]]

taniwha taniwha at taniwha.com
Tue Sep 21 06:49:07 EDT 1999

On Mon, 20 Sep 1999, stoney at sequent.com wrote:
> If I unroll the loops in the fpga's it gets ridiculous, if I unroll the
> loop w/ an embedded microprocessor in RAM it is feasible.  I looked at
> some embedded procs such as the strongARM vr4 w/ the thumb instruction 
> set included.  there is a ROR that will rotate a register a single time,
> but this is not good enough.  I havn't checked the PPC401 instruction
> set.  I believe that fully unrolling the loop in RAM is OK, but it is
> definitely a waste in fpgas.  So, I think we had a miscommunication due
> to the differences in our implementation.

actually I'd disagree - a lot of larger FPGAs (the ones 
about the size we'd need to use anyway) have cheap embedded 
sram - and it turns out that for the current generation 
it's about the right amount to do this

> Paul Cambell aka Taniwha; mentioned that a DSP proc 'sharc' has a 32 bit
> rotate instruction.  

don't quote me to much :-) - please check - this was from memory.

There's a class of DSP that requires very little extra support,
you can build one on a board with a crystal and download code
into it from an external CPU. I think there are sharc varients
that are like this, some of the TI DSPs are like this too. I 
actually think that this is a relatively cost sensitive application
you want to be able to build not just 1 cracker - but an array.
I'd also considered using something like this - the custom 
datapaths in these parts can make them sing at these sorts 
of problems (sadly the last DSP I helped design would have 
been ideal for this problem - has all the right instructions,
had lots of storage, vectorable instruction set, simd
datapath to do 2 keys in parallel, LIW insrtruction set
for ILP etc etc - but it had a 72-bit [or 2x36-bit] datapath
a 36-bit rotate is about as useless as no rotate for this 
particular problem :-)

TI has a 4-CPU DSP that I think has a rotate - that could 
be a killer (but I bet they're expensive)
In fact I'd guess that the difference between an FPGA 
and a fast DSP is pretty close - the DSP will clock a 
lot faster but will take more clocks to get the work done
the FPGA can have more inherent parallelism but will be
slowed by wire delays in it's interconnect

	Paul Campbell
To unsubscribe, send 'unsubscribe hardware' to majordomo at lists.distributed.net

More information about the Hardware mailing list