[stats-dev] Incremental ogr-verify

stats-dev at lists.distributed.net stats-dev at lists.distributed.net
Fri Jun 27 22:10:10 EDT 2003


On Fri, Jun 27, 2003 at 02:55:05PM -0400, stats-dev at lists.distributed.net wrote:
> Here was my original idea.  I know there are some problems, which I will
> address at the end.
> 
> Load the newly retired ids and new ids into ogr_idlookup and into a separate
> table

See my original email.

> Load the data from logdata into day_results

I'm pretty sure we don't want to do this. First, it means extra data to
slog through, for no good reason. Second, I think if we pull today's
data in before processing the new retires, there's a much better chance
of screwing things up. There's no reason to touch today's data before
processing retires; the existing code already handles that case
perfectly fine.

> From the day_results and the newly retired list generate a list of affected
> stub_id,nodecounts affected (call this table affected_stubs)
>
> Delete these stubs from ogr_summary
> 	-These should kick off triggers to update the counts in ogr_complete
> 
> Run the summary for all affected_stubs and insert these records into
> ogr_summary
> 	-These should kick off triggers to update the counts in ogr_complete
> 
> Problems:
> While this is simpler from a human perspective, it is probably more work on
> a daily basis
> It's enbtirely possible that there couls be a stub that has been completed
> with two different stubcounts.  While this isn't a problem in most cases,
> using triggers could  make the counts be off from what they should be once
> in a while unless we're very careful about checking for the stub_id in
> ogr_summary.
 
IIRC, building ogr_complete is a relatively fast process. Because of
this, we shouldn't screw with it for now. Even if we decided to do this
incrementally, I think the updates would probably suck big time as
triggers.

I guess I managed not to update OGR_summary to account for old retires.
Doh! There's no reason to delete though; it would be better to calculate
the new information, then do an update on OGR_summary.

SELECT stub_id, nodecount, count(DISTINCT I.stats_id) AS participants
        , max(P.version) AS max_client
    INTO TEMP Tnew_summary
    FROM retired_stub_id rsi, OGR_results r, OGR_idlookup i, platform p
    WHERE r.stub_id = rsi.stub_id
        AND r.nodecount = rsi.nodecount
        AND i.id = r.id
        AND p.platform_id = r.platform_id
    GROUP BY stub_id, nodecount
;

UPDAT>E OGR_summary
    SET participants = ns.participants
            , max_client = ns.max_client
    FROM Tnew_summary ns
    WHERE ogr_summary.stub_id = ns.stub_id
        AND ogr_summary.nodecount = ns.nodecount
;

Note that for the update to be effective, you'll need to add an index on
stub_id, nodecount to ogr_summary, otherwise it's going to hash-join.
-- 
Jim C. Nasby (aka Decibel!)                    jim at nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


More information about the stats-dev mailing list