Making PHP Applications Cache-Friendly
I'm running a public web-based forum that is read frequently
(about 10,000 hits per day), but gets relatively few postings (in the
range of 20 to 60 per day). The board software is
Phorum, a nice open-source
PHP application. Although the site has
other popular pages to offer, Phorum's
read.php file is the
clear number one in my Apache hit and download volume statistics.
Most of these hits are apparently caused by users pressing "reload"
to check for updates every now and then.
With the rather small number of new postings, it is obvious that Phorum and its
underlying DBMS (in our case, PostgreSQL) have to repeatedly
generate lots of identical responses to identical queries. This is
a waste of bandwidth and server load and makes browsing the forum
appear slower than necessary, particularly for users with slow
connections. It also renders the caching efforts of proxy servers
virtually useless.
Making PHP Applications Cache-Friendly
One approach to minimising redundant transmission of data is the
use of Last-Modified and If-Modified-Since headers as defined in
HTTP/1.1.
In this scheme, each object returned by the webserver carries a date
of last modification (a.k.a. "validator"). A user agent or proxy
cache can store this value and, upon the next reload of the same
object, issue a conditional GET query with the Last-Modified-Since
header set. The webserver will then use this header to decide
whether the client's copy of the object is still "fresh" (as recent
as the data on the server) or "stale" (older than the data on the
server). If it is fresh, there is no need to send the object again,
so the server responds with a brief "304 Not Modified" message
instead.
Modern webservers and user agents (e.g., Apache/1.3,
Netscape Navigator 4.x and above, Internet Explorer) fully support
this technique. Apache automatically handles If-Modified-Since
requests for all static objects by default.
In the case of dynamic content as generated by PHP, we have to
take care of these things manually. We need to return a meaningful
Last-Modified header and handle If-Modified-Since requests so that
the user agent gets fresh data if and only if necessary. For
Phorum, this means that we have to keep track of database updates.
If the database has not changed since the client's last request, we
can simply return 304 without bothering the DBMS at all.
Making PHP Applications Cache-Friendly
The basic approach I am using here is to touch a zero-length file
whenever the database is updated. The file's modification time will
then serve as the Last-Modified date. This is very simple to
implement, but rather primitive as it does not differentiate
between forums: if something is posted in forum X, all
forums in the same database are considered "updated". That's
clearly not very effective when you have lots of posting activity
in more than one or two forums.
A finer-grained scheme -- perhaps
down to tracking individual threads -- would be more appropriate in
that case. Additionally, there are some issues with Phorum's use
of cookies to flag unread messages; the workaround used here is
to allow a full reload periodically.
Please note that the following code is not release quality.
It works, but it's rather simplistic and should be considered a
"proof of concept". I simply patched Phorum
3.2.11 code where it seemed necessary to get quick results. My
changes are in bold print.
common.php:
<?php
if ( !defined( "_COMMON_PHP" ) ){
define("_COMMON_PHP", 1 );
// These variables may be altered as needed:
$modification_file = '/var/tmp/.phorum-update';
// table name that Phorum uses to access meta-information on forums.
?>
include/header.php:
<?php
global $modification_file;
if(isset($modification_file) && ($mtime = filemtime($modification_file))) {
// to be on the safe side, refresh last modification time
// if it's "too old", forcing a reload from time to time.
// this is probably not necessary for most applications
if((mktime() - $mtime) > 3600) {
touch($modification_file);
$mtime = mktime();
}
$gmt_mtime = gmdate('D, d M Y H:i:s', $mtime) . ' GMT';
// we assume that the client generates proper RFC 822/1123 dates
// (should work for all modern browsers and proxy caches)
if ($HTTP_IF_MODIFIED_SINCE == $gmt_mtime) {
header("HTTP/1.1 304 Not Modified");
exit;
}
$lastmod_header = "Last-Modified: " . $gmt_mtime;
Header($lastmod_header);
}
// your header modifications go here...
?>
Making PHP Applications Cache-Friendly
db/postgresql.php: (virtually identical for other databases)
<?
class query {
var $result;
var $row;
var $curr_row;
function query(&$db, $query="") {
// Constructor of the query object.
// executes the query, notifies the db object of the query result to clean
// up later
GLOBAL $modification_file;
if($query!=""){
if (!empty($this->result)) {
$this->free(); // query not called as constructor therefore there may
// be something to clean up.
}
$this->result=@pg_Exec($db->connect_id, $query);
// catch database-altering SQL statements
if(isset($modification_file) && eregi("^(insert|update|alter|delete)", $query))
touch($modification_file);
$db->addquery($this->result);
$this->curr_row=0;
}
}
?>
Bottom Line
If you have an efficient way of tracking the freshness of your
data, implementing proper Last-Modified/If-Modified-Since behaviour
for PHP applications is very simple. It has clear benefits for both
server and client side, and usually no drawbacks. Considering that, it is
surprising how rarely it seems to be used by PHP authors.
Of course, this isn't all there is to creating cache-friendly PHP scripts. For further information, check Mark Nottingham's
Caching Tutorial for Web Authors and Webmasters.
--Klaus A. Brunner