[stats-dev] GUIDs and machine level stats tracking

Ben Gavin thejet at gmail.com
Thu Jul 29 17:47:33 EDT 2004

  Decibel asked that I put together this summary of my thoughts on the
email->GUID transition and specifically on my ideas for machine-level
statistics tracking.  The scenario I am proposing looks like such:

Participant Info <--- Participant GUIDs

The participant GUID table contains the GUID and an optional "friendly
identifier" which would be displayed by the stats system, in place of
the GUID.  There may also be a field which indicates the user's
"primary GUID" which could be used for "phase 1" bits to allow mapping
of the stats email to a specific participant GUID.  I don't believe
that participants should be able to enter GUIDs into the system, nor
edit existing GUID entries assigned to the participant.  It may be
useful for them to be able to "inactivate" a GUID to indicate that
work will no longer be arriving from that GUID and it should no longer
be displayed in the stats system [or maybe displayed in a different

I envision that the user would be able to sign up through the new
user-auth system, and request that they be assigned an additional GUID
[they would be assigned one at first when they sign up, and this would
be marked internally as their primary GUID].  They could then use that
GUID on a machine or group of machines to track stats for that
particular machine/group.

Then as part of the statsproc process, the current GUID->participant
table would be retrieved and used to aggregate the statistics for the
participant at a team level [if they belong to a team].  The only
screen where the individual, machine-level GUID stats display would be
the participant summary screen [although maybe we could provide a
"tree view" type listing for the team??].  This would be part of the
"my.distributed.net" as well as the individual team participant pages.

>From a performance standpoint, we could choose to either
batch-aggregate the stats for a particular participant during
statsproc [basically storing aggregated entries for both the main
participant and machine-level], or choose to just do the aggregation
on the fly.  I would prefer that we setup proper indexing on the
email_contrib table and just do the aggregation on the fly until such
point as performance necessitates doing it  as part of statsproc.

I believe this would be a very useful addition to the new stats
system, and is already heavily used on the GIMPS project.  It seems to
work very well, and allows people to track whether a given machine or
group of machines is underperforming, or to better troubleshoot why
their output isn't what it should be.  Especially in cases where
installing/maintaining a personal proxy is not desired/possible. 
Utilizing this approach also has the benefit of not requiring client
changes, you simply plug in the GUID you've been assigned [or another
that you've requested] and you're off.

Ben [TheJet]

Benjamin Gavin
virtual.olympus software
ben at virtual-olympus.com

More information about the stats-dev mailing list