[Hardware] Notes... The case for an open client
dan_oetting at uswest.net
Sun Aug 15 03:29:28 EDT 2004
Moving right along...
As I stated earlier, the first step is to define exactly what the
logical interface to the hardware will be. I don't want to constrain
the physical interface as that may needlessly prevent some design
The inputs needed by the hardware appear to be a starting key, a block
count, the plain text and part of the cypher text. A reprogrammable
hardware device might choose not to have any inputs but be programmed
with the initial parameters when it's turned on and crank out
consecutive processed blocks until it's turned off.
For devices that cannot be reprogrammed it might be a good idea to
accept a parameter that specifies the key size so it won't be obsoleted
in only a few decades. The same hardware can also process blocks for up
to RC5-96 if it lasts that long.
I've specified that the hardware cores will look for partial match keys
where only the first 32 bits of cypher text is matched. There are
numerous advantages to looking for partial match keys including a
significant optimization of the core and providing a verifiable token
A downside is the potential for high bursts of data that have to come
out of the core. I wondered just how high of a burst could be
For an expected occurrence of 1 event in 2^32 the probability of N
occurrences in 2^32 trials (ie: the number of partial match keys in 1
0 0.36787944113 ~1/e
9 0.00000101378 about 1 in a million
A client should return the actual number of partial match keys that are
in the block even if it cannot return all the keys. For instance,
result messages may be limited to a maximum size and there may only be
room for 8 keys.
Another case where clients may fail to return all the key found is when
the core is massively parallel. Consider a core with 256 processing
units. The probability of getting a number of results from 1 key cycle
is given in the following table:
1 0.0000000596 about 1 in 16 million
2 1.7694178409e-15 little over 2 per million billion
3 3.4880524166e-23 1 in 28 thousand billion billion
We'll see lots of singles and around 65000 doubles. But there is only 1
chance in a thousand of seeing a triple in the whole RC5-72 key space.
It's perfectly valid for a core to only return 1 of the keys found per
cycle. The insignificant number of cases where more keys are reported
in the block than actually returned can be reprocessed by other means.
Clients should still receive credit for the actual keys returned and
the client that reprocesses the block can receive credit for the
It would be best if the client could return every key but sometimes
tradeoffs need to be made.
The most important piece of information coming out of the hardware
cores is the count of found partial keys. In every design I've looked
at (including all software cores), the found partial key is at some
point represented by a single bit and is not subject to any
verification for false negatives. This single bit is an ideal attack
point for Murphy and I would like to see designs address the issue.
I've got some ideas that I will discuss later or perhaps leave as an
exercise for the students. At least with partial found keys, a high
failure rate of this bit will show up as a statistical undercount of
the partial keys found.
I touched on residual checks earlier. In the document
<http://www.distributed.net/source/specs/opcodeauth.html> the idea of
a residual check is shot down as being too costly in terms of extra
overhead in the core. That cost estimate however is based on using CRC
to generate the residual. CRC is used to protect communications where a
significant error rate is expected and back to back errors could cancel
each other out in simpler check codes. In the RC5 project, if a client
is showing any errors it should be rejected or only used to recheck the
work of other clients.
The document also suggests computing the CRC on only the first byte of
each encryption result. This not only increases the cost of computing
the CRC in some cases but eliminates 2/3rds of the gates in the last
few critical operations from monitoring by the residual result.
For the RC5 project, a simple sum is every bit as good as a CRC for a
residual check, is inexpensive to calculate in either hardware or
software and can easily handle merged blocks or parallel computations.
There could even be a case for a 32 bit wide parity instead of an
additive sum since it would be fractionally easier to generate in some
More information about the Hardware