[stats-dev] Attempting to speed movedata.sql for ogr-verification

Chris Hodson nerf at slacker.com
Sun May 4 22:28:53 EDT 2003


How about doing this processing during the hourly runs?  It would make the
hourly runs a bit longer, but would speed up the daily run.  We could also
do the normalization during this step.  Comments?

-----Original Message-----
From: owner-stats-dev at lists.distributed.net
[mailto:owner-stats-dev at lists.distributed.net] On Behalf Of Benjamin Gavin
Sent: Friday, May 02, 2003 10:38 AM
To: stats-dev at distributed.net
Subject: [stats-dev] Attempting to speed movedata.sql for ogr-verification


OK,
  So I was looking through the queries that are part of the ogr-verification
process and trying to determine if we could speed any of them up a bit.  As
an example, I took the query which attempts to insert non-matching stubs
into the primary ogr_stubs table (i.e. those which came in as returns, but
don't currently exist in the ogr_stubs table).  I don't have any real good
performance numbers between the two, but initial tests are showing that the
new query is slightly faster than the old one:

Old Query:
SELECT DISTINCT L.stub_marks, 0
FROM logdata L
WHERE NOT EXISTS (SELECT * FROM ogr_stubs WHERE
			L.stub_marks = ogr_stubs.stub_marks);

Explain Output:
 Unique  (cost=3055.91..3059.66 rows=50 width=26)
   ->Sort  (cost=3055.91..3057.16 rows=500 width=26)
       Sort Key: stub_marks, 0
         ->Seq Scan on logdata l  (cost=0.00..3033.50 rows=500 width=26)
            Filter: (NOT (subplan))
            SubPlan
              ->Index Scan using stubs_marks on ogr_stubs
                (cost=0.00..3.01 rows=1 width=26)
                  Index Cond: ($0 = stub_marks)

New Query:
SELECT DISTINCT L.stub_marks, 0
FROM logdata L LEFT OUTER JOIN ogr_stubs O ON L.stub_marks = O.stub_marks
WHERE O.stub_marks IS NULL;

Explain Output:
 Unique  (cost=3095.83..3103.33 rows=100 width=46)
   ->Sort  (cost=3095.83..3098.33 rows=1000 width=46)
      Sort Key: l.stub_marks, 0
        ->Nested Loop  (cost=0.00..3046.00 rows=1000 width=46)
           Filter: ("inner".stub_marks IS NULL)
             ->Seq Scan on logdata l  (cost=0.00..20.00 rows=1000
width=26)
             ->Index Scan using stubs_marks on ogr_stubs o
               (cost=0.00..3.01 rows=1 width=20)
                  Index Cond: ("outer".stub_marks = o.stub_marks)


You'll notice that the cost estimate for the second query is higher, but in
my initial (admittedly minor) testing, the second tended to return as much
as 20% faster.  I'm fairly certain that the database load was rather high at
the time, and the numbers varied as much as 100% between runs, so we'll need
to do a bit more testing to see if we can eek out a bit more performance
from these.

Ben Gavin
thejet at distributed.net


=====
/******************************/
Benjamin Gavin
Virtual.Olympus Software
Virtual_Olympus at yahoo.com
http://www.virtual-olympus.com/ /******************************/


__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com

--
To unsubscribe, send 'unsubscribe stats-dev' to
majordomo at lists.distributed.net



--
To unsubscribe, send 'unsubscribe stats-dev' to majordomo at lists.distributed.net



More information about the stats-dev mailing list