[RC5] Re: Needing testers (Mac OS X/G5)
elektron_rc5 at yahoo.ca
Thu Aug 12 02:26:23 EDT 2004
> E> Very little cache would be taken by dnetc, because most things are
> E> registers anyway, and dnetc doesn't need much memory anyway (S,
> E> L, 11 in RC5_72UnitWork, and 1 for *iterations is only 41
> longs, or
> E> 164 bytes, which is 0.5% of a 32K cache).
> You're right unless we consider program memory cache.
> D.net cores like loops' unrolling...
Most of the main loop (new_key_mid: to bne+ new_key_mid) is only 1173
instructions, or 4692 bytes, or 15% of a 32K cache. new_key_hi: to bdnz
new_key_hi is 1116 instructions, under 14%.
Either way, programs which rely on a certain size instruction cache to
perform correctly have serious problems, since you can't guarantee a
certain size instruction cache (the standard PowerPC instruction cache
has been 32K for a while now though). If the loop is 512 instructions
too big (over 8192), the execution time more than doubles. Carefully
engineering an 8192 instruction loop is silly anyhow.
On the other hand, reloading the cache from memory (by flushing the
instruction and data caches) takes about as long as looping through the
instruction cache 19 times. Flushing only the instruction cache takes
far less time (about 30% of the time it takes to loop through the
instruction cache), since the L2 is reasonably fast. If we loop any
reasonable number of times (e.g. 64), this is negligible.
Even theoretically, executing the instructions takes 2731 clocks, and
loading 8192 instructions from the L2 takes 4096 clocks (assuming 2
instructions/clock, which should be possible if it's 64 bits wide).
Since it can execute instructions that have been just loaded, you only
take 1365 extra cycles, which isn't very much (about 16 keys on a
Of course, OSX's timeslicing isn't done very well, but it still should
More information about the rc5