[RC5] Tree search algorithms and the next d.net

Mon Apr 19 14:10:34 EDT 1999

```From: Greg Hewgill
> ...Crypto based projects deal with a linear list of keys
>to test - there is a precise number of keys to test, and
> each one must be tested. The OGR algorithm implements
> a search tree - there is no fixed list of tree branches
> to try...

Hmmm this makes me think of a different kind of D.net.
If I understand d.net as it stands now, there is a
master hub that controls all the key blocks. Clients
(and proxies that act like super clients), check out
a number of blocks. They process them and check the
results in.
This works for Crypto as an exhaustive search is required
so you try key 00001, then key 00002, 00003 ... until
you find the correct one or exhaust this well understood
and fixed list.

Now a tree search. There are depth first or breadth first
recursive searches depending upon if you can do some processing
of the intermediate node without having the complete subtree
solved.

searches are inheritly depth first. A fibonacci number
is formed from the sum of the two previous numbers in the
series. So for the node F(5) you need the values of nodes
F(4) and F(3)  [ side note, yes you can calculate fibonacci
numbers in an easy flat loop, this is just an example ]
So you can't do any computing of F(5) until you have tree
walked F(4) and F(3).

Now there is an interesting side issue with Fibonacci
in that the branches of the tree are shared (the
F(4) subtree contains F(3) subtree in it). Lets not

Now for a single computer to do a depth first tree walk,
it needs a large stack memory to know where it is. Also
the general algorithm is a single thread so that walking
a Sagan number of node (billions and billions) would take
a long time for a single processor.

OK, now we have thousands of reasonable power processors
with the capabilities of many thousands of threads
happening at once. They are very loosely coupled and
called D.net.

One scheme would be to elaborate the entire tree and
store them in a central repository. A client would check
out a branch of the tree where all leaf nodes are done,
process it and check the node back in. That branch now becomes
a leaf node (no longer needs more work). The problem is that the
elaborated tree may be extreemely large. The bottleneck will
be that poor central hub system holding the repository having
to check out and in all the branches. There will also be
a tremendous data traffic flow to this hub machine as the
thousands of clients are all queueing up to check out then
back in their branches.

Now if all the processors that was part of the system were
structured into some sort of hierarchy, the tasks would be
divided down recursively just like the tree walking algorithm.
The master client is given a problem. It starts elaborating the
tree to some reasonable depth. Now there will be many
subbranches that will need to be solved before it can work
on the part of the tree it has. Now it will pass these
subbranches off to this client's children as a sub-task.
If there was a near infinite number of clients, eventually
the subtasks become small enough for a client to do completely
on its own. The subbranch is solves and passed back to
the parent client.
The problem here is that the top level clients do some initial
work elborating a piece of a tree, then they wait for the
children to finish. Lots of waiting processors. If the
whole network of clients were drawn like a pool, the
initial effort would make a ripple in the center that fans
out until the problem is easy enough and the answers (and
work still left to do) ripples back to the center.

Well there is not enough processors to make a pure hierarchy
like that. Also the flakeyness has to be accounted for,
clients crashing, being taken offline for other tasks, etc.

Now think of a new d.net topology where there are many hubs
(an even better one is a client may dynamically become a hub
if they prove its trust-worthiness). OK the problem is issued
at a hub and a client checks it out. As it hits the elaboration
limit of this part of the tree, it posts the subbranches that
still need to be worked upon back to the hub as a new problem.
Ideally, the sub-tasks are somehow distributed to the other hubs
to reduce the traffic flow to this hub. Now a client, not having
anything to do, will find a hub and check out a problem, do what
it can do then check it back in (either solving the problem
or creating new sub-problems on the hub with links that say this
piece can be checked out and solved after these sub-problems
are done).

Hmm restating that. A client will check out a tree search problem
and do either of these:
* solve the tree if it can be done now.
* elaborate the tree to n levels, check it back in with the
newly spawned sub-problems.

(a hub will not let anyone check out a branch problem that
was already elaborated but waiting for other branches).

OK, now lets bring the system up to another level. We have
bored clients finding problems on a hub, doing what they can then
checking the results back in. Hubs themselves do some sort of
load balancing between themselves to keep the traffic distributed
on the net. Now what keeps anyone from submitting their problem
to a hub? As long as the problem was deemed "nice and proper"
for the system to perform, nothing should stop it. I think then
we call can benefit from each other's spare CPU cycles.

I am thinking of place and route algorithms for circuits and streets
etc. Problems like simulated thermal annealing and finite element
work where parts of the problem can be dumped on the collective
to work upon as cycles free up. A client would not be just cracking
a code, but might be doing pieces of 20 or 30 tasks at a time.

Of course all "nice and proper" tasks are required to not crash
(or crash softly) and not interfere with any normal tasks
on the host. (the same requirements of the present d.net system)

Can anyone tell that I was utterly bored with the tasks I have to
do at work?