[Hardware] Notes... The case for an open client

Dan Oetting dan_oetting at uswest.net
Sun Aug 15 03:29:28 EDT 2004


Moving right along...

As I stated earlier, the first step is to define exactly what the 
logical interface to the hardware will be. I don't want to constrain 
the physical interface as that may needlessly prevent some design 
choices.

The inputs needed by the hardware appear to be a starting key, a block 
count, the plain text and part of the cypher text. A reprogrammable 
hardware device might choose not to have any inputs but be programmed 
with the initial parameters when it's turned on and crank out 
consecutive processed blocks until it's turned off.

For devices that cannot be reprogrammed it might be a good idea to 
accept a parameter that specifies the key size so it won't be obsoleted 
in only a few decades. The same hardware can also process blocks for up 
to RC5-96 if it lasts that long.


I've specified that the hardware cores will look for partial match keys 
where only the first 32 bits of cypher text is matched. There are 
numerous advantages to looking for partial match keys including a 
significant optimization of the core and providing a verifiable token 
for stats.

A downside is the potential for high bursts of data that have to come 
out of the core. I wondered just how high of a burst could be 
reasonably expected.

For an expected occurrence of 1 event in 2^32 the probability of N 
occurrences in 2^32 trials (ie: the number of partial match keys in 1 
block) is:

N		P
0		0.36787944113	~1/e
1		0.36787944121
2		0.18393972061
3		0.06131324019
4		0.01532831004
5		0.00306566201
6		0.00051094367
7		0.00007299195
8		0.00000912399
9		0.00000101378	about 1 in a million

A client should return the actual number of partial match keys that are 
in the block even if it cannot return all the keys. For instance, 
result messages may be limited to a maximum size and there may only be 
room for 8 keys.

Another case where clients may fail to return all the key found is when 
the core is massively parallel. Consider a core with 256 processing 
units. The probability of getting a number of results from 1 key cycle 
is given in the following table:

N		P
0		0.9999999404
1		0.0000000596		about 1 in 16 million
2		1.7694178409e-15	little over 2 per million billion
3		3.4880524166e-23	 1 in 28 thousand billion billion

We'll see lots of singles and around 65000 doubles. But there is only 1 
chance in a thousand of seeing a triple in the whole RC5-72 key space.

It's perfectly valid for a core to only return 1 of the keys found per 
cycle. The insignificant number of cases where more keys are reported 
in the block than actually returned can be reprocessed by other means. 
Clients should still receive credit for the actual keys returned and 
the client that reprocesses the block can receive credit for the 
remaining keys.

It would be best if the client could return every key but sometimes 
tradeoffs need to be made.


The most important piece of information coming out of the hardware 
cores is the count of found partial keys. In every design I've looked 
at (including all software cores), the found partial key is at some 
point represented by a single bit and is not subject to any 
verification for false negatives. This single bit is an ideal attack 
point for Murphy and I would like to see designs address the issue. 
I've got some ideas that I will discuss later or perhaps leave as an 
exercise for the students. At least with partial found keys, a high 
failure rate of this bit will show up as a statistical undercount of 
the partial keys found.


I touched on residual checks earlier. In the document  
<http://www.distributed.net/source/specs/opcodeauth.html>  the idea of 
a residual check is shot down as being too costly in terms of extra 
overhead in the core. That cost estimate however is based on using CRC 
to generate the residual. CRC is used to protect communications where a 
significant error rate is expected and back to back errors could cancel 
each other out in simpler check codes. In the RC5 project, if a client 
is showing any errors it should be rejected or only used to recheck the 
work of other clients.

The document also suggests computing the CRC on only the first byte of 
each encryption result. This not only increases the cost of computing 
the CRC in some cases but eliminates 2/3rds of the gates in the last 
few critical operations from monitoring by the residual result.

For the RC5 project, a simple sum is every bit as good as a CRC for a 
residual check, is inexpensive to calculate in either hardware or 
software and can easily handle merged blocks or parallel computations. 
There could even be a case for a 32 bit wide parity instead of an 
additive sum since it would be fractionally easier to generate in some 
hardware.



More information about the Hardware mailing list