[RC5] client versions - AIX

Bruce Wilson bwilson at distributed.net
Wed Oct 2 11:23:52 EDT 2002


| > As was noted earlier, it is typical for OGR workunits to take much
| > longer than RC5-64 workunits.  If you are using the most current
| > clients, there is no reason why you should not be able to 
| process OGR
| > workunits until RC5-72 becomes available.  You are of 
| course entitled
| > to ignore the OGR project if it isn't appealing to you.
| 
| First off, a week is too long for any work unit.  It was also 
| the most current 
| version available.   OGR has no appeal to me - for this very reason. 

You may be disappointed with RC5-72.  Based on the success of the longer
workunits in the OGR projects, we are scaling the RC5-72 workunits to a
larger size.  It is very possible that you may get RC5-72 workunits that
take much longer than they did with RC5-64.

Changing the minimum from 2^28 to 2^32 means that the smallest workunit
under RC5-72 will take 16 times longer to process than the smallest
workunit for RC5-64.  A slow machine that used to do a 2^28 in two hours
will now take 32 hours to finish a workunit for RC5-72.  This workunit
would also earn 16 times as much stats credit, just as if it had done 16
workunits of 2^28.

A workunit of 2^36 would take 256 times longer, and 2^40 (if we offer
them) would take 4096 times longer.  With proper checkpointing in place,
this is no problem.  I plan on using the largest workunit available as
soon as we get started, no matter how long it takes.  As someone else
mentioned, this amount of work per workunit is not unusual in
distributed projects.

| As I have mention prior too...  Give out bigger work units to 
| personal proxies. 
| Let them handout smaller usable ones (2^28 if required, maybe 
| memory only 
| clients or older machines) to the local machines.  Then the 
| personal proxies 
| can re-gather the information and hand back the big work unit 
| to the server in 
| sky. If a part is missing - it can locally rehand it out to 
| get it fixed and the 
| block big block returned.  This way, some of the large farms 
| out there, you 
| could hand out 2^48 or larger and let them do the sub work.  
| This way your 
| server are off loaded.

The fullproxies do receive larger "superblocks".  I'm not sure of the
size, but they are significantly larger.  Perproxies don't receive
superblocks because (a) superblocks are only available directly from the
keymaster, and (b) there is too much potential for abuse.  If someone
has enough clients to justify downloading this much work at once, and a
computer large and well-connected enough to support such a workload, we
would encourage them to contact us so we can talk about setting them up
as a new fullproxy, serving all our participants.

The problem is, the smaller we allow the workunit to be subdivided, the
more details we need to track at the keymaster.  It's not a safe
assumption that work handed out by a perproxy will be returned to the
same proxy to be recombined.  Even if they do come back to the same
place, there could be enough of a delay to increase the storage
requirement of the proxy, and to delay credit to those who turned it in
first.  If I do half of a workunit, and the other half gets deleted from
someone's buff-in, I might never get credit at all!

It's impractical to use smaller units at the clients than we track at
the keymaster.  If a perproxy splits a 2^32 16 ways to 16 different
clients, then who gets stats credit for the finished 2^32?

In the final calculation, it takes just as long to do a single 2^32 as
it does to do 16 2^28's, and the stats credits are the same, so why make
a big deal of it?  Most distributed projects give you no control
whatsoever of the size workunit you want.  We've considered this option
too, specifically because it makes discussions like this moot.

| Also if the personal proxy runs out of work units... Allow it 
| to pick the 
| random block to use.  This way it will "create" a block and 
| hand it out to the 
| local crunchers.  In the end the personal proxy starts to 
| appear to be the 
| center of AMP machine, with one DNET client on each processor... no 
| matter the type.

The clients already take responsibility for picking their own randoms.
It doesn't make sense to shift this to the perproxies, because the
client still needs to generate randoms if it can't reach the perproxy.

| At the same time let the personal proxies hand back pre 
| processed stats.  
| So checking a completed blocked from a personal proxy, also 
| gives back 
| tallies for the stats.  

Each workunit returned must be ticked off as "completed" in the
keymaster.  It doesn't make sense to have the proxies send back a
summary when we still need all the detail.  We have also implemented
mechanisms in the past where work from certain client versions are
discarded at the master.  If a proxy combines work from 10 computers
using the same ID but different versions and platforms, we lose the
ability to filter out the noise.  This also makes the perproxies much
more complicated, which makes the code much harder to modify and
maintain.

We like the proxies as waypoints, merely passing work from here to there
without knowing much about what is inside each workunit.  This makes it
easier for our network to support new projects without a lot of changes.

__
Bruce Wilson <bwilson at distributed.net>
PGP KeyID: 5430B995, http://www.toomuchblue.com/ 

Build a man a fire and he'll be warm for a day.
Set a man on fire, he'll be warm for the rest of his life.

--
To unsubscribe, send 'unsubscribe rc5' to majordomo at lists.distributed.net
rc5-digest subscribers replace rc5 with rc5-digest



More information about the rc5 mailing list