[RC5] Tree search algorithms and the next d.net

Adam Zilinskas AZilinskas at SolutionsIQ.com
Mon Apr 19 14:10:34 EDT 1999


From: Greg Hewgill 
> ...Crypto based projects deal with a linear list of keys 
>to test - there is a precise number of keys to test, and
> each one must be tested. The OGR algorithm implements 
> a search tree - there is no fixed list of tree branches
> to try...

Hmmm this makes me think of a different kind of D.net.
If I understand d.net as it stands now, there is a 
 master hub that controls all the key blocks. Clients
 (and proxies that act like super clients), check out 
  a number of blocks. They process them and check the
  results in.
 This works for Crypto as an exhaustive search is required
  so you try key 00001, then key 00002, 00003 ... until 
  you find the correct one or exhaust this well understood
  and fixed list.

Now a tree search. There are depth first or breadth first 
 recursive searches depending upon if you can do some processing
 of the intermediate node without having the complete subtree 
 solved. 

I should really think about this more, but some tree
 searches are inheritly depth first. A fibonacci number
 is formed from the sum of the two previous numbers in the 
 series. So for the node F(5) you need the values of nodes 
 F(4) and F(3)  [ side note, yes you can calculate fibonacci
 numbers in an easy flat loop, this is just an example ]
 So you can't do any computing of F(5) until you have tree 
 walked F(4) and F(3).

Now there is an interesting side issue with Fibonacci
 in that the branches of the tree are shared (the 
  F(4) subtree contains F(3) subtree in it). Lets not
  worry about that right now.

Now for a single computer to do a depth first tree walk, 
 it needs a large stack memory to know where it is. Also 
 the general algorithm is a single thread so that walking 
 a Sagan number of node (billions and billions) would take
 a long time for a single processor.

OK, now we have thousands of reasonable power processors
 with the capabilities of many thousands of threads 
 happening at once. They are very loosely coupled and 
 called D.net.

One scheme would be to elaborate the entire tree and 
 store them in a central repository. A client would check 
 out a branch of the tree where all leaf nodes are done, 
 process it and check the node back in. That branch now becomes
 a leaf node (no longer needs more work). The problem is that the 
 elaborated tree may be extreemely large. The bottleneck will
 be that poor central hub system holding the repository having
 to check out and in all the branches. There will also be 
 a tremendous data traffic flow to this hub machine as the 
 thousands of clients are all queueing up to check out then
 back in their branches.

Now if all the processors that was part of the system were 
 structured into some sort of hierarchy, the tasks would be 
 divided down recursively just like the tree walking algorithm.
 The master client is given a problem. It starts elaborating the
  tree to some reasonable depth. Now there will be many 
  subbranches that will need to be solved before it can work 
  on the part of the tree it has. Now it will pass these 
  subbranches off to this client's children as a sub-task.
If there was a near infinite number of clients, eventually
 the subtasks become small enough for a client to do completely
 on its own. The subbranch is solves and passed back to 
 the parent client.
The problem here is that the top level clients do some initial
 work elborating a piece of a tree, then they wait for the 
 children to finish. Lots of waiting processors. If the 
 whole network of clients were drawn like a pool, the 
 initial effort would make a ripple in the center that fans
 out until the problem is easy enough and the answers (and
 work still left to do) ripples back to the center.

Well there is not enough processors to make a pure hierarchy
 like that. Also the flakeyness has to be accounted for,
 clients crashing, being taken offline for other tasks, etc.

Now think of a new d.net topology where there are many hubs
  (an even better one is a client may dynamically become a hub
  if they prove its trust-worthiness). OK the problem is issued
  at a hub and a client checks it out. As it hits the elaboration
  limit of this part of the tree, it posts the subbranches that 
  still need to be worked upon back to the hub as a new problem.
  Ideally, the sub-tasks are somehow distributed to the other hubs
  to reduce the traffic flow to this hub. Now a client, not having
  anything to do, will find a hub and check out a problem, do what 
  it can do then check it back in (either solving the problem
  or creating new sub-problems on the hub with links that say this
  piece can be checked out and solved after these sub-problems 
  are done).

Hmm restating that. A client will check out a tree search problem 
  and do either of these:
 * solve the tree if it can be done now.
 * elaborate the tree to n levels, check it back in with the 
   newly spawned sub-problems.

(a hub will not let anyone check out a branch problem that
  was already elaborated but waiting for other branches).

OK, now lets bring the system up to another level. We have 
 bored clients finding problems on a hub, doing what they can then
 checking the results back in. Hubs themselves do some sort of
 load balancing between themselves to keep the traffic distributed
 on the net. Now what keeps anyone from submitting their problem
 to a hub? As long as the problem was deemed "nice and proper" 
 for the system to perform, nothing should stop it. I think then
 we call can benefit from each other's spare CPU cycles. 

I am thinking of place and route algorithms for circuits and streets
 etc. Problems like simulated thermal annealing and finite element
 work where parts of the problem can be dumped on the collective
 to work upon as cycles free up. A client would not be just cracking
 a code, but might be doing pieces of 20 or 30 tasks at a time.

Of course all "nice and proper" tasks are required to not crash
 (or crash softly) and not interfere with any normal tasks
 on the host. (the same requirements of the present d.net system)

Can anyone tell that I was utterly bored with the tasks I have to
 do at work?

                  Adam Zilinskas
                  Solutions IQ
                  azilinskas at solutionsiq.com

--
To unsubscribe, send 'unsubscribe rc5' to majordomo at lists.distributed.net
rc5-digest subscribers replace rc5 with rc5-digest



More information about the rc5 mailing list