Making PHP Applications Cache-Friendly

I'm running a public web-based forum that is read frequently (about 10,000 hits per day), but gets relatively few postings (in the range of 20 to 60 per day). The board software is Phorum, a nice open-source PHP application. Although the site has other popular pages to offer, Phorum's read.php file is the clear number one in my Apache hit and download volume statistics. Most of these hits are apparently caused by users pressing "reload" to check for updates every now and then.
With the rather small number of new postings, it is obvious that Phorum and its underlying DBMS (in our case, PostgreSQL) have to repeatedly generate lots of identical responses to identical queries. This is a waste of bandwidth and server load and makes browsing the forum appear slower than necessary, particularly for users with slow connections. It also renders the caching efforts of proxy servers virtually useless.

Making PHP Applications Cache-Friendly

One approach to minimising redundant transmission of data is the use of Last-Modified and If-Modified-Since headers as defined in HTTP/1.1.
In this scheme, each object returned by the webserver carries a date of last modification (a.k.a. "validator"). A user agent or proxy cache can store this value and, upon the next reload of the same object, issue a conditional GET query with the Last-Modified-Since header set. The webserver will then use this header to decide whether the client's copy of the object is still "fresh" (as recent as the data on the server) or "stale" (older than the data on the server). If it is fresh, there is no need to send the object again, so the server responds with a brief "304 Not Modified" message instead.
Modern webservers and user agents (e.g., Apache/1.3, Netscape Navigator 4.x and above, Internet Explorer) fully support this technique. Apache automatically handles If-Modified-Since requests for all static objects by default.
In the case of dynamic content as generated by PHP, we have to take care of these things manually. We need to return a meaningful Last-Modified header and handle If-Modified-Since requests so that the user agent gets fresh data if and only if necessary. For Phorum, this means that we have to keep track of database updates. If the database has not changed since the client's last request, we can simply return 304 without bothering the DBMS at all.

Making PHP Applications Cache-Friendly

The basic approach I am using here is to touch a zero-length file whenever the database is updated. The file's modification time will then serve as the Last-Modified date. This is very simple to implement, but rather primitive as it does not differentiate between forums: if something is posted in forum X, all forums in the same database are considered "updated". That's clearly not very effective when you have lots of posting activity in more than one or two forums.
A finer-grained scheme -- perhaps down to tracking individual threads -- would be more appropriate in that case. Additionally, there are some issues with Phorum's use of cookies to flag unread messages; the workaround used here is to allow a full reload periodically.
Please note that the following code is not release quality. It works, but it's rather simplistic and should be considered a "proof of concept". I simply patched Phorum 3.2.11 code where it seemed necessary to get quick results. My changes are in bold print.

common.php:


<?php

  
if ( !defined"_COMMON_PHP" ) ){
    
define("_COMMON_PHP");
 
  
// These variables may be altered as needed:

    
$modification_file '/var/tmp/.phorum-update';

  
// table name that Phorum uses to access meta-information on forums.
   
?>

include/header.php:


<?php

   
global $modification_file;
   if(isset(
$modification_file) &amp;&amp; ($mtime filemtime($modification_file))) {
      
// to be on the safe side, refresh last modification time
      //  if it's "too old", forcing a reload from time to time.
      //   this is probably not necessary for most applications
      
if((mktime() - $mtime) &gt3600) {
         
touch($modification_file);
         
$mtime mktime();
      }

      
$gmt_mtime gmdate('D, d M Y H:i:s'$mtime) . ' GMT';

      
// we assume that the client generates proper RFC 822/1123 dates
      //   (should work for all modern browsers and proxy caches)
      
if ($HTTP_IF_MODIFIED_SINCE == $gmt_mtime) {
            
header("HTTP/1.1 304 Not Modified");
            exit;
      }

      
$lastmod_header "Last-Modified: " $gmt_mtime;
      
Header($lastmod_header);
   }

   
// your header modifications go here...

?>

Making PHP Applications Cache-Friendly

db/postgresql.php: (virtually identical for other databases)


<?

class query {
 
  var 
$result;
  var 
$row;
  var 
$curr_row;
 
  function 
query(&amp;$db$query="") {
  
// Constructor of the query object.
  // executes the query, notifies the db object of the query result to clean
  // up later
 
    
GLOBAL $modification_file;
 
    if(
$query!=""){
      if (!empty(
$this-&gt;result)) {
        
$this-&gt;free(); // query not called as constructor therefore there may
                       // be something to clean up.
      
}
      
$this-&gt;result=@pg_Exec($db-&gt;connect_id$query);
 
      
// catch database-altering SQL statements
      
if(isset($modification_file) &amp;&amperegi("^(insert|update|alter|delete)"$query))
         
touch($modification_file);
 
      
$db-&gt;addquery($this-&gt;result);
      
$this-&gt;curr_row=0;
    }
  }
?>

Bottom Line

If you have an efficient way of tracking the freshness of your data, implementing proper Last-Modified/If-Modified-Since behaviour for PHP applications is very simple. It has clear benefits for both server and client side, and usually no drawbacks. Considering that, it is surprising how rarely it seems to be used by PHP authors.
Of course, this isn't all there is to creating cache-friendly PHP scripts. For further information, check Mark Nottingham's Caching Tutorial for Web Authors and Webmasters.
--Klaus A. Brunner