[RC5] Re: d.net project: indexing the web

Richard Chiswell rc-distributed at beebware.demon.co.uk
Tue Jul 13 23:02:57 EDT 1999


On Tue 13 Jul, "Matt.Wilkie" <Matt.Wilkie at gov.yk.ca> wrote:
> In a nutshell, current search engines only index about 16% of the total
> web. There's simply too much data to sift through from a centralized
> computer. Harnessing individual computers to index as they browse 
> could potentially  greatly improve search engine's indices.
> 
> I think d.net could help craft a solution to this problem because: 
> 
> 1) (to the best of my knowledge) most well developed distributed 
> computing  network in existence today
> 
> 2) d.net developers and participants are well aware of privacy 
> issues, and anything they build in this line will be privacy 'compliant'
> 
> 3) a proprietary version of this is doomed to failure, nobody will
> trust it and it wouldn't get broadly enough used.
> 
> What does everybody else think?

For the record, there is The Open Directory Project which lives at
http://dmoz.org which has around 13,000 volunteer editors which have, to
date, manually edited over 730,000 sites (more than 1,000 sites are
being reviewed per day).

The data produced by Dmoz.Org (or ODP) is 'free source' which means
it can be downloaded and used by anybody else. In fact, the data
produced is currently being used by Netscape, HotBot, DogPile and
a whole host of other search engines.

Dmoz.Org has only just passed its first birthday, and is still growing.
All editors (except for around a dozen staff members) are unpaid
volunteers (as we are on distributed.net) which receive details of
sites users have submitted (or they can go out and find the sites
themselves) and review them before accepting/rejecting them into the
directory.

In case you are wondering what happens when a link goes down (ie if
a webmaster takes down a listed page) we have a regular 'bot called
'Robozilla' which runs through the directory and checks the links for
us. It then 'flags' them for humans to check (we don't want any automated
systems messing up our carefully created directory).

If you are interested in becoming an editor, or just interested in this
project in general, please feel free to go to our main page at:
http://dmoz.org and follow the links (or use the search engine). Once
you've found a category 'you like' (ideally you locality or a language
you speak - just to help you 'find your feet') apply to 'become an
editor'.

Any other questions, please feel free to contact me,

Richy C.
Dmoz.Org editor: beebware . editall+catmv (for the meaning of these two
                            you will have to become an editor)
"Humans do it better" - http://dmoz.org - [over 735,000 sites listed] 
-- 
Beebware International - http://www.beebware.com - info at beebware.com
           Forest Road, Huncote, Leicester, LE9 3BH, UK
Telephone: UK: 0870 740 13 73                Fax: UK: 0870 740 45 90
Telephone: US: 011448707401373 (int'l rates) Fax: US: (603) 719-2140


--
To unsubscribe, send 'unsubscribe rc5' to majordomo at lists.distributed.net
rc5-digest subscribers replace rc5 with rc5-digest



More information about the rc5 mailing list