[rc5] Identical results, two CPUs

root marcus at dfwmm.net
Sat Jul 26 16:10:34 EDT 1997

Tom Wheeler wrote:
> On Sat, 26 Jul 1997 11:15:34 -0500, Mike Silbersack wrote:
> >> Among other machines, I'm running the client on a P200/MMX and a
> >> regular P200.  A coworker says there's "no way" these two machines can
> >> be running at virtually identical speed, but they are - about 240Kk/s.
> >> Would anyone care to take a crack (ha ha) at explaining why they are
> >> the same so my coworker and I can end this silliness?
> >>
> >> Tom Wheeler
> >
> >Well, the Pentium MMX has an improved branch prediction unit and a larger
> >internal cache, so it *should* be faster.

Except that the ratio of ops to branch ops is _very_ large in the rc5
algorithm, so the branch prediction will have little effect. Second,
the BPU is optimized for some vague "normal" code mix. The rc5 is not
anywhere near a "normal" everyday app as far as the instruction mix.

>  However, the possibility also
> >exists that the motherboards/bios settings differ on these systems to a
> >great enough extent that the MMX is being slowed down to the speed of the
> >Pentium Classic.  Maybe if you check out the BIOS settings on the two
> >computers you will find that the Pentium Classic is using more aggressive
> >timings, has a larger external cache, or something else.
> I can see the better branch prediction making a small difference.  I
> don't think the larger L1 cache is relevant - I would hazard to guess
> that the client is small enough to fit in the P200's cache (or mostly,
> anyway).

Ah, think again, "mostly" means there is little benefit at all. The rc5
is almost linear code that runs straight through from start to end. The
network and user interfaces while consuming the bulk of the code, use
only a trivial amount of the cpu so we can ignore it. The key crunching
component will benefit little or not at all if it is just a little too
large to fit in the cache, since the fetches for the later portions of
the code will overwrite the earlier portions and when the code returns
to execute the earlier portion, it must fetch from the external cache
once again. The only gain would be in the burst cache line fill. Notice
how well the Cyrix, AMD, and PPro CPUs with large on-chip, full cpu
caches, execute this code. The on chip caches on these chips _do_ hold
entire key cruncher and there is still some room to spare. If you have
seen the stats, check a Cyrix PR166+ which runs at a lowly 133MHz and
produces 250+ Kkeys/sec. A Pentium at 133MHz does a mere 150+, at 166 it
will produce 200+. A PPro at 200 will do 490+.

> As for BIOS settings, the machines unfortunately have the AMI "Let's
> give the user exactly no control over his machine" BIOS, so there's not
> much I can even look at there, much less change.  Nevertheless, the
> client is massively compute-bound anyway, so I can't think of any
> changes to the BIOS that would make a difference.

You mean things like the level 2 cache wait states? Like a write-back,
write-through cache-memory coherence policy? Like DRAM wait states? DMA
device drivers? If you really want to get serious, we can talk about bus
clocking, PCI bursting/buffering/pipelining, and ...suffice it to say
there are a lot of things that can affect the "compute bound" code's
of execution. Add 3% here, 2% there, 5% for this change, and 15% over
and soon you have 25%. Noticeable? Yes, tuning makes a difference. While
of these changes might not have a great effect on the rc5 client
directly, if
they make your _normal_ apps/os/io or whatever more efficient, then
there are
more cpu cycles available for the client. So the improvement in overall
efficiency shows right up in the rc5 performance. Notice the thread on
crappy cpu hogging USR modem software, yes tuning makes a difference.

To succinctly answer Tom's(?) question about the MMX, let's put it this
MMX = More Money for X(intel). Not much else, really. Is a 5-15%
in performance for _some_ benchmarks really worth a 50% price hike? Face
MMX is valuable to intel, intel's stockholders, the advertising
agencies, and
a few users, that's about all. I own intel stock because it has been and
likely continue to be a good investment. When it comes to spending money
cpus, there are others that produce more bang for my bucks. Meanwhile,
guys just go on buying intels new procs. I'll be happy. Your coworker
to listen to the ads less and read more.
To unsubscribe, send email to majordomo at llamas.net with 'unsubscribe rc5' in the body.

More information about the rc5 mailing list