[plans] distributed.net .plan update

plans at nodezero.distributed.net plans at nodezero.distributed.net
Thu Dec 30 19:00:02 EST 1999


.plan updates in the last 24 hours: 
---
nugget :: 30-Dec-1999 04:04 (Thursday) ::

blerg.  Some days I wish I dug ditches for a living.  :)

Still banging on stats, looking promising, but progressing slowly.

More info as the situation evolves.

Moo ]:8)



nugget :: 30-Dec-1999 09:09 (Thursday) ::

It would appear that csc stats are up-to-date, although I admit I haven't 
thoroughly audited the numbers.  We had to do quite a bit of "by hand"
data massaging during the recovery process, so I will take some time 
tomorrow to sanity check the numbers.  Stats had to be re-run from 27-Dec
to current.

rc5-64 stats are running now, and barring any problems will be on auto-pilot
for the next several hours.  I'm going to take the opportunity to get
a bit of sleep and rest my eyes.

If there are any problems they will likely be related to team joining/team
changing during the previous three days.  If you joined a team or change
teams recently, you might see some differences.  (if you changed teams 
on 28-Dec, the freshly-run stats will actually look like you changed teams
on 27-Dec).  We should be able to correct these discrepancies tomorrow.

More info as we wrap up the repair.  Thanks for your patience.


decibel :: 30-Dec-1999 21:14 (Thursday) ::

For all of you who want to know the gory details of what happened to stats
last night, here they are.

About a week ago, Sybase skipped a bunch of identities in STATS_Participant,
the main participant info table. This tends to happen when the server is
shut down abnormally, and has happened several times in the past. Unfortunately,
we didn't catch it when it happened this time, so there were several days of
data with bad participant IDs. This meant re-writing the script we use to
correct this problem.

We re-wrote the script a few days ago, but hadn't run it yet. Yesterday, Bruce
came up with a change for the statsrun code that would allow us to eliminate
the identity field from STATS_Participant, thereby getting rid of this issue.
Since this would also require rebuilding STATS_Participant, it seemed a perfect
time to fix the identity problem.

With the new statsrun code in place and looking good, I was ready to run the
re-identity script. I almost made a backup copy of STATS_Participant, but
remembered that the re-identity script made a backup copy on it's own. Of course,
this breaks an axiom of computers that goes something like 'Too many backups is
almost enough.'

The script had a minor syntax error, and in the process of trying to debug it, I
ran the script several times. This had the unfortunate side-effect of sending all
traces of STATS_Participant to the great bit-bucket in the sky.

Not to worry, we make weekly backups of the database for this very reason. With a 
boatload of help from Nugget, we got a copy of STATS_Participant from Dec 27 back
into the database. Though, it took us three tries to figure out how to get it back
in without having Sybase redo all the identities.

Once that was in, we fixed the identity problem (making loads of backups along the
way this time) and pondered how to recover the participant data and team info that
had been changed/added to STATS_Participant in the past few days. We decided that
re-running those days from logs would be the easiest.

We extracted the days in question from the master tables and used that info to reset
everyone's team affiliation to what it was for the 12/28 statsrun (a process that
ended up taking a few hours). We fired off the statsrun and waited. We had to fine-
tune Bruce's code a bit, but things seem to be going well now and the 12/29 RC5 data
is just about done.

The only thing left to do is re-assign blocks for the past few days to the correct
teams for the handful of people who changed their team membership during those days.
Because that hasn't been done yet, some of the team stats might look a bit off.
After tonight's statsrun, everything should be back to normal.

I'm glad we won't have this problem with STATS_Participants again! }:8)

Thanks as always for your patience and CPU time.


--
To unsubscribe, send 'unsubscribe plans' to majordomo at lists.distributed.net



More information about the plans mailing list