[stats-dev] PHP output caching and "dead" projects
Chris Jones
fiddles at distributed.net
Fri Nov 21 18:11:29 EST 2003
On 21-Nov-2003, Benjamin Gavin wrote:
> OK,
> I've been looking at ways to improve the performance of the stats
> system. I've identified a couple problem areas which are currently
> effecting the speed of the stats system:
>
> 1. Page content is generated dynamically, at every request: This is
> perhaps the largest problem. Our stats database updates only once every
> night, and performing potentially intensive database queries to service
> request after request is not helping either the database server or the
> users.
>
> 2. "Dead" projects continue to take as many (or more) resources than
> currently active projects: From the database side of the fence, inactive
> projects cost us a large chunk, especially if those older projects stay
> "popular" for a long period of time. This is largely due to database
> cache thrashing and the like, but also due to the sheer volume of data
> that needs to be kept around long term. If we can't get rid of the data
> completely, we can certainly try to limit the number of times it is
> queried.
Simply have the "dead" projects purely in cached form. I can't see that the data
would be being used overly much anymore, especially CSC.
> So, in the interest of finding a solution to this, and since nobody seems
> to agree with me that just eliminating the data from the database
> completely is a valid option... I have arrived at a scheme for caching
> the output of the various system pages which could be utilized firstly for
> the "dead" projects (to alleviate #2), and potentially for the "live"
> projects as well (to alleviate #1).
>
> The schema that I have arrived at looks as follows:
>
> 1. Maintain a list of "dead" projects, or include a field in the database
> for "closed" status. (this may exist already, but the documentation on the
> DB schema is sparse)
>
> 2. If the requested project is in the list of "dead" projects (or for all
> projects long term), then check to see if the cache file already exists
> for the current page request. If so, serve it up from the cache,
> otherwise regenerate the page and place the result in the cache.
>
> In my preliminary testing (on my local box), which is certainly less beefy
> than blower, qualitative response times seem to have improved by 40-50%
> (sometimes 100-200% for team/participant list pages).
>
> The caching structure is as follows:
>
> Directory: /cache/[project_id]/[page name]
>
> File: SHA1(normalized query string).html
>
> The nice thing about using the normalized query string is that it
> automatically handles things like password protected team member pages.
> Unless the person knows the correct team password, they will not be able
> to retrieve the cached page with the team member information. The cache
> directories could be placed in a location which is not accessible through
> the web root as well to avoid people "lucky guessing" the filenames.
Agreed.
> That leaves two remaining pieces:
>
> 1. An Exception List: Those pages which should never be cached, e.g.
> participant editing, team joining, etc
>
> 2. Stats Proc Changes: If we implement caching for all projects, then we
> would need to add a final step to the stats proc routines which clears out
> the web caches when the stats run is complete.
>
> Just FYI, adding caching was about 20 lines of code in project.inc and 10
> in footer.inc. A better implementation would be to split the caching
> logic into it's own include and link it to the files which could
> reasonably be cached. It would also be good to include the notion of a
> "page error" which would cause the page not to be cached due to a database
> error, improper authentication, etc.
>
> So... thoughts, comments, etc?
>
> Ben [TheJet]
--
Chris Jones
fiddles at distributed.net
More information about the stats-dev
mailing list