Many ideas have been thrown around for a simple cache layer to add to the front-end portion of a web site to allow it to passively cache web pages as flat files. I have analyzed the methods I have seen people use, noted what I like about them and what I don't like, and then added what I feel such a caching system should be capable of in the real world scenarios that I have seen while working in the business. I set out with my resulting wish list to construct a caching system that would address all these things, in as little code as possible.
1) Cache according to an interval. In this manner, one would set a "refresh rate" according to which all cached files are refreshed.
2) The interval at which these cached files expire should be syncronized, and not relative to the time a cached file was created. This minimizes the chance of two files that call out to the same dynamic data will offer a discrepancy. Each time a series of files with duplicate data expire at the same time, there is an opportunity that they will be re-cached with matching content.
3) This system should neither rely on a cron script nor any other mechanism to explicitly push cached content, nor have to know where/if these cacheable scripts exist.
4) Cached files should be stored in a centrallized location, not relative to the actual script whose output was cached, so that the entire cache can be flushed easily.
5) A single script should be cached separately for each request to it (i.e. foo.php?id=1 and foo.php?id=2 create two different cached files), but the caching system should place a cap on how many requests to a single script can be cached before all requests to that script are flushed. Otherwise, an exploit can be crafted to increment a get var and fill the cache.
6) Once a day, the entire cache should be flushed so that any cached files who no longer possess a parent script will be cleaned up. This should not have to rely on any external processes (i.e. cron).
7) Only GET requests should be cached, not POST. POST actions will differ from user to user and therefore should be excluded from the cache.
8) The caching system should not just rely on serving a flat file cache of a script's output, but it should also send an expires header so that a browser will not repeatedly contact the server for the same cached file.
9) Since the expiration interval is absolute and not relative to the file's cache time, the expires header that is sent must consistently use the expiration interval and not be based on the cache file's modification time.
10) The expires header should always be sent, regardless of whether we will send cached content or generated content.
12) The cache system should perform as few operations as possible in order to serve a cached file.
13) The cache system should have low additional overhead to cache a script.
14) Object-oriented programming should not be used to implement the caching mechanism, as this would add unnecessary overhead.
15) The file containing the caching code should only contain the caching code. Only if a file needs to regenerate its output should any non-cache related code be included into the script. This means a web application should never be called through include or require unless the cache layer first determines that a script's output must be regenerated. Therefore, the caching mechanism should never be part of the web application itself. (As a side note, this method is proposed without any reliance on PEAR distribution.)
16) The cache system must be FAST.
Continue to the rest of the article for the resulting code, fully documented and ready to try out yourself.