>From what I've heard, the Athlon fpu allows more parallel i-decode and
execution than G4.  Although you share fpu's to accomplish MMX/3D-Now,
you have more fp engines at work.

G4 AV has 2(?) engines and separate fpu engines.

I wonder if it's possible to abuse the G4 fp engines for a little
integer work?  32 integer registers + 32 fp registers (integer work)
+ 32 AV banks.   I wonder...

Zypher wrote:

> I can give a little "proof" that RC5 can somehow be made faster on K7s...
> First off, I regularly run either RC5 or Gamma Flux on my Athlon system...
> I know my usual rates when nothing else is running...they're pretty static,
> I leave it on overnight.
> THEN, launch both programs...let them go through a few blocks
> Compare the rates...to put it simple, they add up to more than 100%. (GF
> takes more, but RC5 does not slow down as much as GF 'takes')
> They come to about 120%...and both are 'near 100% cpu idle usage' type
> programs. Following my very limited knowledge of cpus, I'd have to say each
> is utitlizing different parts of the core, and I know the K7 core was made
> to do as many instructions per cycle as was feasible...
> So if each is getting work out of 'another part' of the CPU, then they can
> both be made to use that other part to some extent. I'll leave GF to
> dcypher, but RC5 is ontopic here ;)
> I think the real task is getting a decent amount of work done, writing the
> right code. Certainly beyond me, though I'm glad to see theres others on it.
> I live for speed in all its forms.
> > >Has anyone looked further into the AMD core optimisation as discussed in:
> > >http://lists.distributed.net/hypermail/rc5.Oct1999/0000.html
> > >It looked very promissing to me, but I'm afraid I have niether the
> > >programming skill, nor the time to look into it. If anyone shoud however
> > >know about some good literature about this subject, i would be glad to
> hear
> > >about it.
> >
> > I have started some work on theoretical code.  The main changes to the
> > original idea is that I have decoupled the round 3 key expansion and
> > encryption phases and will attempt to do most of the encryption using MMX
> > instructions while the integer code does the key expansion for the next
> two
> > keys.  Originally I had MMX instructions doing part of round 1 of the key
> > expansion of the next two keys while the integer code finished it off.
