[Hardware] "success"

John L. Bass jbass at dmsd.com
Thu Oct 19 02:48:52 EDT 2006


	Martin,
	Have you calculated the memory requirements for your core?
	What is the expected throughput?

	Regards,
	David

Conceptually, this design strategy is pretty clean and easy to implement, but it's
something of a monster to get any reasonable performance out of.

Actually, it's such a large design, that it has a high probability of sending
the tools into the ground ... either outright crash, or unable to P&R anything
close to usable timings. My first attempts to feed it into VHDL crashed ISE
every time.

It's nearly impossible to get an optimal design using an HDL (Verilog, FpgaC)
using this strategy with Xilinx tools because the layout is messy with excessive
routing delays. In theory you should be able to get it to run at a few combinatorial
delays, but in practice the tools create both combinatorial and routing delays
that are considerably longer. The variable barrel shifter is the bottle neck,
and this design includes the arithmetics into the same pipeline, making the logic
considerably deeper than optimal for a pipeline. The nasty part is that fixing that
requires twice or three times the SBox retiming storage, which is already a pretty
dominate factor in the performance and sizing, using lots of LUT based rams.

To get good timings post place and route requires working with smaller
macros that have an optimal hand layout, then using the floor planner
and FPGA editor to optimally place the macros prior to routing so that
the constraints on the router can be met at near optimal timings.

As for size, Martin included the rough pencil and paper guess of what to expect
post synthesis ... about 40K LUT's ... with a variance of about minus 20% if the
tools do really good, to around plus 50% if it horribly falls down on packing.

My first attempt with FpgaC last fall was pretty bad. With some fixes early
this summer it did a lot better, and I'm still working on more optimal technology
mapping for FpgaC in my spare time for the next release.

My 11 year old daughter drowned while swiwming in a local lake at a friends
house last day of school, and things here have been a bit unsettled since. Not
making nearly the progress on FpgaC as I would have liked, and there hasn't been
any help with the project since last winter.

An optimal design for size, power, performance is actually a hand packed bit
or digit serial design depending on the FPGA. An interestingly enough, a much
more fun challenge :)    ... there are notes in the archives about this back
in 2004 and Dan and discussed this (along with a few others).

FPGA's are fast enough at this task, that a relatively small number of the
largest devices are faster than all of DNet. But even at that, the problem
is not likely to be solved for decades. A relatively poor use of energy to
prove RSA's marketing claims.

John


More information about the Hardware mailing list