picture of Tim Perdue
Figuring out where your hits are coming from and which pages are being viewed is not too difficult if you use a good log analyzer like Analog or something similar.
But if you want to pull up reports on the fly at any time, you need to take a different route.
For about six months, I was importing my Apache logs into an SQL database and then running ad-hoc queries against that.
Unfortunately, that requires me to go out and manually grab those log files, tweak the format a bit, then import. I wanted something better - something that was continually updated, gave me information on what content was being viewed, browser and platforms stats, and trends over time.
The solution I came up with in my original version of this article has served me well for a couple years now. I enhanced it somewhat when we built SourceForge and all logging and tracking for the entire SourceForge.net web site pipes through this system, including page views on the main site and page views for each of the 12,000 projects that use the site.
Basically, what I do is add a row to a table in a database for each action I want to log (page views in this example, but you can log banner add views, clicks, click-ins from other sites, etc).
Here is the table structure I've been using on SourceForge:
create table activity_log (
  day integer DEFAULT '0' NOT NULL,
  hour integer DEFAULT '0' NOT NULL,
  group_id integer DEFAULT '0' NOT NULL,
  browser varchar(8) DEFAULT 'OTHER' NOT NULL,
  ver float(10) DEFAULT '0.00' NOT NULL,
  platform varchar(8) DEFAULT 'OTHER' NOT NULL,
  time integer DEFAULT '0' NOT NULL,
  page text,
  type integer DEFAULT '0' NOT NULL
);
The group_id column can be used if you have multiple web sites you are trying to track. Just assign a different group_id to each site and pass it in the URL (shown below).
type is unused in these examples, but you can use it to track different types of actions, like ad clicks, clickins from other sites, etc.
browser, ver, platform are all obtained from the $HTTP_USER_AGENT variable which is passed by your browser to the web server. I have a browser detection library which we call to fill in this info. The code for that is available elsewhere in this article.
page is any arbitrary information that you want to log - I generally log $PHP_SELF.
The next step was getting the info into the tables. It would be a perfect world if all pages were served through PHP and all of your various web sites existed on one box. Unfortunately, I have an array of servers scattered all over the country, and I want to collect this info from every server for every page view 24 hours a day.
So that required me to use the 1x1 pixel GIF trick. I have a GIF on every page on every server that looks like this:
<IMG SRC="http://www.yourserver.com/util/gif11.php3?c=4&s=phpbuildercom&b=77" 
height=1 width=1>
gif11.php3 is a simple script that resides on my central server. I have included the source on the next page. Since the gif is on each page, and is forced to load due to the random number at the end (the b=xxxx), a request is sent back to the central server for each and every page view.
If all of your pages and the database reside on one server, you don't need to use the GIF trick - you can insert the logging code into the header of your page.
Here's a drawing demonstrating what's going on here:
diagram showing flow of Database from Web Server to Web Browser 
to Central Server to Database
There are 4 components in that drawing:
So the user visits the web server, gets a page get served up, and the browser sees the 1x1 gif and requests it from the central server. The central server then logs that information to make reporting easier.
REPORTING
Reporting all this information is easy with PHP - just write up some SQL, execute it and use the ShowResults() function (included below) to display it. You can also graph it using the Graphing Lib that I mentioned in a prior article.
So now the system is in place and I can run a report at any time and get statistics up to the minute for any given site.

Sample SQL to run reports

Page Views By Day
SELECT day, count(*) 
FROM activity_log 
WHERE type=0 GROUP BY day
Page Views By Browser
SELECT browser, count(*) 
FROM activity_log  
WHERE type=0 GROUP BY browser
Page Views By Browser Version
SELECT browser, ver, count(*)
FROM activity_log
WHERE type=0 GROUP BY browser, ver
Page Views By Platform
SELECT platform, count(*)
FROM activity_log
WHERE type=0 GROUP BY platform
Overall, I'm pretty impressed with this system. There's really no limit to its flexibility.
Please use the sample code on the next page and tell me how you used it, or if you added anything interesting to it.

gif11.php3


<!--

Warning - remove everything outside of the <?php tags or 
your image will break

    
//gif11.php3 accepts these parameters:
$c;    //correlates to group_id in the database -
    //unique for each site
$s;    //correlates to page in the database
$b;    //random number - forces the gif to load, even if cached
    //generated by PHP and discarded
-->

<?
php

//database connection code
//and abstraction layer
include('database.php');

//browser detection library
include('browser.php');

$type=0;    //impression
$page=$s;    //move the $s param for the logger
$group_id=$c;    //ditto for the $c param

header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
header("Last-Modified: " gmdate("D, d M Y H:i:s") . " GMT");
header("Cache-Control: no-cache");
header("Cache-Control: post-check=0,pre-check=0");
header("Cache-Control: max-age=0");
header("Pragma: no-cache");

Header"Content-type:  image/gif");
echo 
readfile ($DOCUMENT_ROOT."/util/clear.gif" );

//
//    If eveything resides on the same server, just include
//    the following code on every page instead of doing the 1x1 gif trick
//
$sql =  "INSERT INTO activity_log "
    
"(day,hour,group_id,browser,ver,platform,time,page,type) "
    
"VALUES (" date('Ymd'mktime()) . ",'" date('H'mktime())
    . 
"','$group_id','" browser_get_agent() . "','" browser_get_version()
    . 
"','" browser_get_platform() . "','" time() . "','$PHP_SELF','$type');";

$res_logger db_query $sql );

if (!
$res_logger) {
    echo 
"An error occured in the logger.\n";
    echo 
db_error();
    exit;
}

?>

PHP Code to generate the GIF URL:


<?php

srand
((double)microtime()*1000000);
$random_num=rand(5,95);
?>
<p>
<img src="http://www.yourserver.com/util/gif11.php3?c=4&s=phpbuildercom&b=<?
echo $random_num?>" height=1 width=1>

ShowResults.php:


<?php

function html_get_alt_row_color ($i) {
    if (
$i == 0) {
        return 
'#FFFFFF';
    } else {
        return 
'#EEEEEE';
    }       
}       

Function  
ShowResults($result,$title="Untitled")  {
    
/*
        Very simple, plain way to show a generic result set
        Accepts a result set and title
        Makes certain items into HTML links
    */

    
if  ($result)  {
        
$rows  =  db_numrows($result);
        
$cols  =  db_numfields($result);

        echo 
'
            <TABLE BORDER="0" WIDTH="100%">'
;

        echo 
'
        <TR BGCOLOR="WHITE">
        <TD COLSPAN="'
.$cols.'"><B><FONT COLOR="BLACK">'.$title.'</B></TD></TR>';

        echo 
'
            <tr>'
;
        for (
$i=0$i $cols$i++) {
            echo 
'<td><B>'.db_fieldname($result,  $i).'</B></TD>';
        }
        echo 
'</tr>';

        
/*  Create the rows  */
        
for ($j 0$j $rows$j++) {
            echo 
'<TR BGCOLOR="'html_get_alt_row_color($j) .'">';
            for (
$i 0$i $cols$i++) {
                echo 
'<td>'.db_result($result,  $j,  $i) . '</td>';
            }       
            echo 
'</tr>';
        }       
        echo 
'</table>';
    } else {
        echo 
db_error();
    }       
}       

?>

browser.php


<?php
//
// SourceForge: Breaking Down the Barriers to Open Source Development
// Copyright 1999-2000 (c) The SourceForge Crew
// http://sourceforge.net
//
// $Id: tim20001211.php3,v 1.3 2001/05/22 19:22:47 tim Exp $


unset ($BROWSER_AGENT);
unset (
$BROWSER_VER);
unset (
$BROWSER_PLATFORM);

function 
browser_get_agent () {
    global 
$BROWSER_AGENT;
    return 
$BROWSER_AGENT;
}

function 
browser_get_version() {
    global 
$BROWSER_VER;
    return 
$BROWSER_VER;
}

function 
browser_get_platform() {
    global 
$BROWSER_PLATFORM;
    return 
$BROWSER_PLATFORM;
}

function 
browser_is_mac() {
    if (
browser_get_platform()=='Mac') {
        return 
true;
    } else {
        return 
false;
    }
}

function 
browser_is_windows() {
    if (
browser_get_platform()=='Win') {
        return 
true;
    } else {
        return 
false;
    }
}

function 
browser_is_ie() {
    if (
browser_get_agent()=='IE') {
        return 
true;
    } else {
        return 
false;
    }       
}       

function 
browser_is_netscape() {
    if (
browser_get_agent()=='MOZILLA') {
        return 
true;
    } else {
        return 
false;
    }       
}       


/*
    Determine browser and version
*/


if (ereg'MSIE ([0-9].[0-9]{1,2})',$HTTP_USER_AGENT,$log_version)) {
    
$BROWSER_VER=$log_version[1];
    
$BROWSER_AGENT='IE';
} elseif (
ereg'Opera ([0-9].[0-9]{1,2})',$HTTP_USER_AGENT,$log_version)) {
    
$BROWSER_VER=$log_version[1];
    
$BROWSER_AGENT='OPERA';
} elseif (
ereg'Mozilla/([0-9].[0-9]{1,2})',$HTTP_USER_AGENT,$log_version)) {
    
$BROWSER_VER=$log_version[1];
    
$BROWSER_AGENT='MOZILLA';
} else {
    
$BROWSER_VER=0;
    
$BROWSER_AGENT='OTHER';
}

/*
    Determine platform
*/

if (strstr($HTTP_USER_AGENT,'Win')) {
    
$BROWSER_PLATFORM='Win';
} else if (
strstr($HTTP_USER_AGENT,'Mac')) {
    
$BROWSER_PLATFORM='Mac';
} else if (
strstr($HTTP_USER_AGENT,'Linux')) {
    
$BROWSER_PLATFORM='Linux';
} else if (
strstr($HTTP_USER_AGENT,'Unix')) {
    
$BROWSER_PLATFORM='Unix';
} else {
    
$BROWSER_PLATFORM='Other';
}       

/*
//debug code
echo "\n\nAgent: $HTTP_USER_AGENT";
echo "\nIE: ".browser_is_ie();
echo "\nMac: ".browser_is_mac();
echo "\nWindows: ".browser_is_windows();
echo "\nPlatform: ".browser_get_platform();
echo "\nVersion: ".browser_get_version();
echo "\nAgent: ".browser_get_agent();
*/

?>
Now you should have most of the code you need to get started. There's a lot more you can do with this system, especially if you use it to track ad banners, clickins and clickouts, and multiple web sites. As your site scales, you may want to crunch down the activity_log table once a day and store the results in a separate table.
Here is an attachment file containing the browser detection and database abstraction libraries. Happy logging. Be sure to post your enhancements to the comment forms below.
--Tim