[RC5] Re: Needing testers (Mac OS X/G5)

Elektron elektron_rc5 at yahoo.ca
Thu Aug 12 02:26:23 EDT 2004


>  E> Very little cache would be taken by dnetc, because most things are 
> in
>  E> registers anyway, and dnetc doesn't need much memory anyway (S[26],
>  E> L[3], 11 in RC5_72UnitWork, and 1 for *iterations is only 41 
> longs, or
>  E> 164 bytes, which is 0.5% of a 32K cache).
>
> You're right unless we consider program memory cache.
>
> D.net cores like loops' unrolling...

Most of the main loop (new_key_mid: to bne+ new_key_mid) is only 1173 
instructions, or 4692 bytes, or 15% of a 32K cache. new_key_hi: to bdnz 
new_key_hi is 1116 instructions, under 14%.

Either way, programs which rely on a certain size instruction cache to 
perform correctly have serious problems, since you can't guarantee a 
certain size instruction cache (the standard PowerPC instruction cache 
has been 32K for a while now though). If the loop is 512 instructions 
too big (over 8192), the execution time more than doubles. Carefully 
engineering an 8192 instruction loop is silly anyhow.

On the other hand, reloading the cache from memory (by flushing the 
instruction and data caches) takes about as long as looping through the 
instruction cache 19 times. Flushing only the instruction cache takes 
far less time (about 30% of the time it takes to loop through the 
instruction cache), since the L2 is reasonably fast. If we loop any 
reasonable number of times (e.g. 64), this is negligible.

Even theoretically, executing the instructions takes 2731 clocks, and 
loading 8192 instructions from the L2 takes 4096 clocks (assuming 2 
instructions/clock, which should be possible if it's 64 bits wide). 
Since it can execute instructions that have been just loaded, you only 
take 1365 extra cycles, which isn't very much (about 16 keys on a 
KKS7450).

Of course, OSX's timeslicing isn't done very well, but it still should 
be negligible.

- Purr



More information about the rc5 mailing list