[stats-dev] log loader

Chris Hodson nerf at slacker.com
Thu Apr 21 20:13:02 EDT 2005


Recent activity has spurred me to think about this project again.  For
those of you who just joined us (or have short memories), the idea is to
have a database that mimics the raw logs that the master has.  There are
many reasons this would be useful.

I'm going to recap some of the issues that have been decided and some
that (in my mind) haven't.  Feel free to jump in with opinions about
anything at any time.

Decided:
	* Single table for all entries
	* Email, and client version will be normalized
	* All records (even hackers, worms, etc) will be included
	* Program to do the loading will be written in perl

Open issues:
	* How much pre-processing?  Sanity only?
	* Should the pre-processor be written in C or part of the perl program?
	* Is there any daily processing to be done?
	* Where will the lookup info be stored?  e.g Will it use the same email -> id lookup table that the rest of stats uses?
	* Will this be independent of any other stats work or will it fit in?

A few words about the pre-processor; If we use a C pre-processor, it's
obviously faster, but at a cost of portability.  This portability loss
is both in having to compile the C program (not a huge deal), but also
in the database loading.  If we go with the perl solution, it would make
sense to use DBI and load the data while it's already split and in
memory.  The advantage is that there are DBD module that can talk to
make RDBMSs (including ODBC) which would ease the adoption by anyone
outside of dnet who wanted to start doing their own logging on a
perproxy.  Just my $.02.

I welcome any comments or questions.

-Nerf


More information about the stats-dev mailing list