[stats-dev] [stats-dev-owner@lists.distributed.net: stats-dev post from paul@quakenet.org requires approval]

Jim C. Nasby decibel at distributed.net
Sun Nov 23 18:22:54 EST 2003

For some reason this didn't seem to go out, so here it is.
----- Forwarded message from stats-dev-owner at lists.distributed.net -----
From: "Paul Richards" <paul at quakenet.org>
To: "'stats.distributed.net Development'" <stats-dev at lists.distributed.net>
Subject: RE: [stats-dev] PHP output caching and "dead" projects
Date: Sat, 22 Nov 2003 00:05:33 -0000
X-Mailer: Microsoft Office Outlook, Build 11.0.5510
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
In-Reply-To: <20031121174443.34949.qmail at web9601.mail.yahoo.com>
Thread-Index: AcOwVzaHEnMFaSD3T62n//vwAU0l0QANB57g
X-OriginalArrivalTime: 22 Nov 2003 00:05:38.0147 (UTC)
X-Virus-Scanned: by amavisd-new at llamas.net

Do we actually have an issue here?

Firstly,  We don't (yet) run phpca/turck mmcache or whatever it is on blower

Secondly, At the moment a page request involes loading db class, participant
class, and then header/footer.inc, and then finally included 'that pages

The cached version of the site would involve doing:

Include dbpgsql.inc
Include header.inc

If exist(cache file) { 
 include cache.inc
Include footer.inc

Anything less then that will cause us trouble in the future imo.

I'd personally like to evaludate the idea of having blower run pgsql, and a
separate box sit in front of it running php. But, at the end of the day,
from what I've seen, blower seems to cope.

Caching some pages might involving caching just one db query. - this come
down to whether or not it's quicker to read a cached response from the file
system, or to run a db query. 


> -----Original Message-----
> From: stats-dev-bounces at lists.distributed.net 
> [mailto:stats-dev-bounces at lists.distributed.net] On Behalf Of 
> Benjamin Gavin
> Sent: 21 November 2003 17:45
> To: stats-dev at lists.distributed.net
> Subject: [stats-dev] PHP output caching and "dead" projects
> OK,
>   I've been looking at ways to improve the performance of the 
> stats system.  I've identified a couple problem areas which 
> are currently effecting the speed of the stats system:
> 1.  Page content is generated dynamically, at every request:  
> This is perhaps the largest problem.  Our stats database 
> updates only once every night, and performing potentially 
> intensive database queries to service request after request 
> is not helping either the database server or the users.
> 2.  "Dead" projects continue to take as many (or more) 
> resources than currently active projects:  From the database 
> side of the fence, inactive projects cost us a large chunk, 
> especially if those older projects stay "popular" for a long 
> period of time.  This is largely due to database cache 
> thrashing and the like, but also due to the sheer volume of 
> data that needs to be kept around long term.  If we can't get 
> rid of the data completely, we can certainly try to limit the 
> number of times it is queried.
> So, in the interest of finding a solution to this, and since 
> nobody seems to agree with me that just eliminating the data 
> from the database completely is a valid option...  I have 
> arrived at a scheme for caching the output of the various 
> system pages which could be utilized firstly for the "dead" 
> projects (to alleviate #2), and potentially for the "live"
> projects as well (to alleviate #1).
> The schema that I have arrived at looks as follows:
> 1.  Maintain a list of "dead" projects, or include a field in 
> the database for "closed" status. (this may exist already, 
> but the documentation on the DB schema is sparse)
> 2.  If the requested project is in the list of "dead" 
> projects (or for all projects long term), then check to see 
> if the cache file already exists for the current page 
> request.  If so, serve it up from the cache, otherwise 
> regenerate the page and place the result in the cache.
> In my preliminary testing (on my local box), which is 
> certainly less beefy than blower, qualitative response times 
> seem to have improved by 40-50% (sometimes 100-200% for 
> team/participant list pages).
> The caching structure is as follows:
> Directory: /cache/[project_id]/[page name]
> File: SHA1(normalized query string).html
> The nice thing about using the normalized query string is 
> that it automatically handles things like password protected 
> team member pages. 
> Unless the person knows the correct team password, they will 
> not be able to retrieve the cached page with the team member 
> information.  The cache directories could be placed in a 
> location which is not accessible through the web root as well 
> to avoid people "lucky guessing" the filenames.
> That leaves two remaining pieces:
> 1.  An Exception List: Those pages which should never be cached, e.g.
> participant editing, team joining, etc
> 2.  Stats Proc Changes: If we implement caching for all 
> projects, then we would need to add a final step to the stats 
> proc routines which clears out the web caches when the stats 
> run is complete.
> Just FYI, adding caching was about 20 lines of code in 
> project.inc and 10 in footer.inc.  A better implementation 
> would be to split the caching logic into it's own include and 
> link it to the files which could reasonably be cached.  It 
> would also be good to include the notion of a "page error" 
> which would cause the page not to be cached due to a database 
> error, improper authentication, etc.
> So... thoughts, comments, etc?
> Ben [TheJet]
> __________________________________
> Do you Yahoo!?
> Free Pop-Up Blocker - Get it now
> http://companion.yahoo.com/
> _______________________________________________
> stats-dev mailing list
> stats-dev at lists.distributed.net
> http://lists.distributed.net/mailman/listinfo/stats-dev

Subject: confirm d59f4b97fd9ab53ae40ee4b6fa5997b881e52713
From: stats-dev-request at lists.distributed.net

If you reply to this message, keeping the Subject: header intact,
Mailman will discard the held message.  Do this if the message is
spam.  If you reply to this message and include an Approved: header
with the list password in it, the message will be approved for posting
to the list.  The Approved: header can also appear in the first line
of the body of the reply.

----- End forwarded message -----

Jim C. Nasby, Database Consultant                  jim at nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

More information about the stats-dev mailing list