[rc5] Re: rc5-digest V1 #91

Ralf Helbing helbing at isgnw.cs.Uni-Magdeburg.DE
Thu Aug 21 23:09:23 EDT 1997


Moin,

> This is the "picky, picky, picky" quasi-academic response.

Oh, yeah, you're from the real world.  I wasn't going to take part in
this one, but here it goes:

> You'll *never* get 2X due to overhead (architectural as well as OS). 

Right.  So you lose some % due to that overhead.  OTOH, you win some %
by not having as much context switch overhead.  Which one outweighs
the other, we don't really know.  Depend on how bad the arch overhead
really is and how many context switches you have.  In any case, that
doesn't have anything to do with the particular application, yet.

> Depending on how clean the hardware and OS is, the closer you'll get
> to 2X.  I'm pretty impressed with Linux. It is probably 1.95 or
> better. (Just guessing from experience.) NT is worse, but probably
> 1.9 -- not 1.5!

I think he was talking about multi-threaded apps that try to take
advantage of multiple processors.  Only this would explain how he got
those 1.5 figures.  In order to cooperate, these threads must
communicate and the communication overhead tends to eat the possible
performance gain.  Communication sometimes incurs synchronization (to
prevent race conditions) which means the threads are waiting for each
other.  That effectively increases the amount serial code per thread
and kills parallelism.

Suppose you have a renderer/raytracer (common example for parallel
code since there are many independent tasks -> pixels).  The serial
code for instance is in the parser the reads the 3D model and builds
the internal data structures.  This part is rarely parallelized.  Once
the data structures are there, the rendering can be efficiently
distributed among the processors.  There is a rule, called Amdahl's
Law, that says that only the time spent in parallel code can be
reduced, but the serial time remains the same no matter how many procs
there are.  So the percentage of serial runtime sets the limit for
what we gain from parallelizing.

However, none of this has anything to do with two entirely different
and independent processes.  These would do exactly the same work
(startup, buffer management, network comunication, crunching) twice.
So it only depends on which effect (better scheduling or worse
architectural overhead) has the greatest impact on performance.  Now,
the situation would be a little different if only one process did the
communication/buffering stuff while the other one(s) would only crack.
In that case, however, there would be more communication since there
is more work being done, so the overhead on the master thread would be
N times a much as in the single threaded case.

It all depends.

Cheers,
  Ralf
-- 
"He's shooting at a different movie!"

Ralf Helbing,    University of Magdeburg,     Department of Computer Science
39106 Magdeburg, UniPlatz 2                         Phone: +49 0391 67-12189
----
To unsubscribe, send email to majordomo at llamas.net with 'unsubscribe rc5' in the body.



More information about the rc5 mailing list