[stats-dev] Attempting to speed movedata.sql for ogr-verification

Bruce Wilson bwilson at getright.org
Tue May 6 13:57:34 EDT 2003


I don't know how well it translates to pgsql, but the first query
would be faster on sybase/mssql if it was written as "in" query rather
than an "exists" query.

Ex:

BW Query:
SELECT DISTINCT L.stub_marks, 0
FROM logdata L
WHERE L.stub_marks not in (SELECT stub_marks FROM ogr_stubs);


| -----Original Message-----
| From: owner-stats-dev at lists.distributed.net 
| [mailto:owner-stats-dev at lists.distributed.net] On Behalf Of 
| Chris Hodson
| Sent: Sunday, May 04, 2003 20:29
| To: stats-dev at lists.distributed.net
| Subject: RE: [stats-dev] Attempting to speed movedata.sql for 
| ogr-verification
| 
| 
| How about doing this processing during the hourly runs?  It 
| would make the
| hourly runs a bit longer, but would speed up the daily run.  
| We could also
| do the normalization during this step.  Comments?
| 
| -----Original Message-----
| From: owner-stats-dev at lists.distributed.net
| [mailto:owner-stats-dev at lists.distributed.net] On Behalf Of 
| Benjamin Gavin
| Sent: Friday, May 02, 2003 10:38 AM
| To: stats-dev at distributed.net
| Subject: [stats-dev] Attempting to speed movedata.sql for 
| ogr-verification
| 
| 
| OK,
|   So I was looking through the queries that are part of the 
| ogr-verification
| process and trying to determine if we could speed any of them 
| up a bit.  As
| an example, I took the query which attempts to insert 
| non-matching stubs
| into the primary ogr_stubs table (i.e. those which came in as 
| returns, but
| don't currently exist in the ogr_stubs table).  I don't have 
| any real good
| performance numbers between the two, but initial tests are 
| showing that the
| new query is slightly faster than the old one:
| 
| Old Query:
| SELECT DISTINCT L.stub_marks, 0
| FROM logdata L
| WHERE NOT EXISTS (SELECT * FROM ogr_stubs WHERE
| 			L.stub_marks = ogr_stubs.stub_marks);
| 
| Explain Output:
|  Unique  (cost=3055.91..3059.66 rows=50 width=26)
|    ->Sort  (cost=3055.91..3057.16 rows=500 width=26)
|        Sort Key: stub_marks, 0
|          ->Seq Scan on logdata l  (cost=0.00..3033.50 
| rows=500 width=26)
|             Filter: (NOT (subplan))
|             SubPlan
|               ->Index Scan using stubs_marks on ogr_stubs
|                 (cost=0.00..3.01 rows=1 width=26)
|                   Index Cond: ($0 = stub_marks)
| 
| New Query:
| SELECT DISTINCT L.stub_marks, 0
| FROM logdata L LEFT OUTER JOIN ogr_stubs O ON L.stub_marks = 
| O.stub_marks
| WHERE O.stub_marks IS NULL;
| 
| Explain Output:
|  Unique  (cost=3095.83..3103.33 rows=100 width=46)
|    ->Sort  (cost=3095.83..3098.33 rows=1000 width=46)
|       Sort Key: l.stub_marks, 0
|         ->Nested Loop  (cost=0.00..3046.00 rows=1000 width=46)
|            Filter: ("inner".stub_marks IS NULL)
|              ->Seq Scan on logdata l  (cost=0.00..20.00 rows=1000
| width=26)
|              ->Index Scan using stubs_marks on ogr_stubs o
|                (cost=0.00..3.01 rows=1 width=20)
|                   Index Cond: ("outer".stub_marks = o.stub_marks)
| 
| 
| You'll notice that the cost estimate for the second query is 
| higher, but in
| my initial (admittedly minor) testing, the second tended to 
| return as much
| as 20% faster.  I'm fairly certain that the database load was 
| rather high at
| the time, and the numbers varied as much as 100% between 
| runs, so we'll need
| to do a bit more testing to see if we can eek out a bit more 
| performance
| from these.
| 
| Ben Gavin
| thejet at distributed.net
| 
| 
| =====
| /******************************/
| Benjamin Gavin
| Virtual.Olympus Software
| Virtual_Olympus at yahoo.com
| http://www.virtual-olympus.com/ /******************************/
| 
| 
| __________________________________
| Do you Yahoo!?
| The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com
| 
| --
| To unsubscribe, send 'unsubscribe stats-dev' to
| majordomo at lists.distributed.net
| 
| 
| 
| --
| To unsubscribe, send 'unsubscribe stats-dev' to 
| majordomo at lists.distributed.net
| 
| 


--
To unsubscribe, send 'unsubscribe stats-dev' to majordomo at lists.distributed.net



More information about the stats-dev mailing list