Figuring out where your hits are coming from and which
pages are being viewed is not too difficult if you use a
good log analyzer like Analog or something similar.
But if you want to pull up reports on the fly at any time,
you need to take a different route.
For about six months, I was importing my Apache logs into
an SQL database and then running ad-hoc queries against that.
Unfortunately, that requires me to go out and manually grab
those log files, tweak the format a bit, then import. I wanted
something better - something that was continually updated,
gave me information on what content was being viewed,
browser and platforms stats, and trends over time.
The solution I came up with in my
original version of this
article has served me well for a couple years now. I enhanced it
somewhat when we built SourceForge and all logging and tracking
for the entire SourceForge.net web site pipes through this
system, including page views on the main site and page views
for each of the 12,000 projects that use the site.
Basically, what I do is add a row to a table in a database for
each action I want to log (page views in this example, but you can log
banner add views, clicks, click-ins from other sites, etc).
Here is the table structure I've been using on SourceForge:
create table activity_log (
day integer DEFAULT '0' NOT NULL,
hour integer DEFAULT '0' NOT NULL,
group_id integer DEFAULT '0' NOT NULL,
browser varchar(8) DEFAULT 'OTHER' NOT NULL,
ver float(10) DEFAULT '0.00' NOT NULL,
platform varchar(8) DEFAULT 'OTHER' NOT NULL,
time integer DEFAULT '0' NOT NULL,
page text,
type integer DEFAULT '0' NOT NULL
);
The group_id column can be used if you have multiple web sites
you are trying to track. Just assign a different group_id to each site
and pass it in the URL (shown below).
type is unused in these
examples, but you can use it to track different types of actions, like
ad clicks, clickins from other sites, etc.
browser, ver, platform are all obtained from the $HTTP_USER_AGENT
variable which is passed by your browser to the web server. I have a
browser detection library which we call to fill in this info. The code for
that is available elsewhere in this article.
page is any arbitrary information that you want to log - I
generally log $PHP_SELF.
The next step was getting the info into the tables. It would be
a perfect world if all pages were served through PHP and all of your
various web sites existed on one box. Unfortunately, I have an array
of servers scattered all over the country, and I want to collect this
info from every server for every page view 24 hours a day.
So that required me to use the 1x1 pixel GIF trick. I have a GIF
on every page on every server that looks like this:
<IMG SRC="http://www.yourserver.com/util/gif11.php3?c=4&s=phpbuildercom&b=77"
height=1 width=1>
gif11.php3 is a simple script that resides on my central server.
I have included the source on the next page.
Since the gif is on each page, and is forced to load due to the random
number at the end (the b=xxxx), a request is sent back to the central
server for each and every page view.
If all of your pages and the database reside on one server, you don't
need to use the GIF trick - you can insert the logging code into the
header of your page.
Here's a drawing demonstrating what's going on here:
There are 4 components in that drawing:
- Remote web server
- Web browser (user visiting that site)
- Central logging server
- Central database server
So the user visits the web server, gets a page get served up, and the browser
sees the 1x1 gif and requests it from the central server. The central
server then logs that information to make reporting easier.
REPORTING
Reporting all this information is easy with PHP - just write up some SQL,
execute it and use the
ShowResults() function (included below) to display it.
You can also graph it using the Graphing Lib that I mentioned in
a
prior article.
So now the system is in place and I can run a report at any time
and get statistics up to the minute for any given site.
Sample SQL to run reports
Page Views By Day
SELECT day, count(*)
FROM activity_log
WHERE type=0 GROUP BY day
Page Views By Browser
SELECT browser, count(*)
FROM activity_log
WHERE type=0 GROUP BY browser
Page Views By Browser Version
SELECT browser, ver, count(*)
FROM activity_log
WHERE type=0 GROUP BY browser, ver
Page Views By Platform
SELECT platform, count(*)
FROM activity_log
WHERE type=0 GROUP BY platform
Overall, I'm pretty impressed with this system. There's really no limit to its
flexibility.
Please use the sample code on the next page and tell me how you
used it, or if you added anything interesting to it.
gif11.php3
<!--
Warning - remove everything outside of the <?php tags or
your image will break
//gif11.php3 accepts these parameters:
$c; //correlates to group_id in the database -
//unique for each site
$s; //correlates to page in the database
$b; //random number - forces the gif to load, even if cached
//generated by PHP and discarded
-->
<?php
//database connection code
//and abstraction layer
include('database.php');
//browser detection library
include('browser.php');
$type=0; //impression
$page=$s; //move the $s param for the logger
$group_id=$c; //ditto for the $c param
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header("Cache-Control: no-cache");
header("Cache-Control: post-check=0,pre-check=0");
header("Cache-Control: max-age=0");
header("Pragma: no-cache");
Header( "Content-type: image/gif");
echo readfile ($DOCUMENT_ROOT."/util/clear.gif" );
//
// If eveything resides on the same server, just include
// the following code on every page instead of doing the 1x1 gif trick
//
$sql = "INSERT INTO activity_log "
. "(day,hour,group_id,browser,ver,platform,time,page,type) "
. "VALUES (" . date('Ymd', mktime()) . ",'" . date('H', mktime())
. "','$group_id','" . browser_get_agent() . "','" . browser_get_version()
. "','" . browser_get_platform() . "','" . time() . "','$PHP_SELF','$type');";
$res_logger = db_query ( $sql );
if (!$res_logger) {
echo "An error occured in the logger.\n";
echo db_error();
exit;
}
?>
PHP Code to generate the GIF URL:
<?php
srand((double)microtime()*1000000);
$random_num=rand(5,95);
?>
<p>
<img src="http://www.yourserver.com/util/gif11.php3?c=4&s=phpbuildercom&b=<?
echo $random_num; ?>" height=1 width=1>
ShowResults.php:
<?php
function html_get_alt_row_color ($i) {
if ($i % 2 == 0) {
return '#FFFFFF';
} else {
return '#EEEEEE';
}
}
Function ShowResults($result,$title="Untitled") {
/*
Very simple, plain way to show a generic result set
Accepts a result set and title
Makes certain items into HTML links
*/
if ($result) {
$rows = db_numrows($result);
$cols = db_numfields($result);
echo '
<TABLE BORDER="0" WIDTH="100%">';
echo '
<TR BGCOLOR="WHITE">
<TD COLSPAN="'.$cols.'"><B><FONT COLOR="BLACK">'.$title.'</B></TD></TR>';
echo '
<tr>';
for ($i=0; $i < $cols; $i++) {
echo '<td><B>'.db_fieldname($result, $i).'</B></TD>';
}
echo '</tr>';
/* Create the rows */
for ($j = 0; $j < $rows; $j++) {
echo '<TR BGCOLOR="'. html_get_alt_row_color($j) .'">';
for ($i = 0; $i < $cols; $i++) {
echo '<td>'.db_result($result, $j, $i) . '</td>';
}
echo '</tr>';
}
echo '</table>';
} else {
echo db_error();
}
}
?>
browser.php
<?php
//
// SourceForge: Breaking Down the Barriers to Open Source Development
// Copyright 1999-2000 (c) The SourceForge Crew
// http://sourceforge.net
//
// $Id: tim20001211.php3,v 1.3 2001/05/22 19:22:47 tim Exp $
unset ($BROWSER_AGENT);
unset ($BROWSER_VER);
unset ($BROWSER_PLATFORM);
function browser_get_agent () {
global $BROWSER_AGENT;
return $BROWSER_AGENT;
}
function browser_get_version() {
global $BROWSER_VER;
return $BROWSER_VER;
}
function browser_get_platform() {
global $BROWSER_PLATFORM;
return $BROWSER_PLATFORM;
}
function browser_is_mac() {
if (browser_get_platform()=='Mac') {
return true;
} else {
return false;
}
}
function browser_is_windows() {
if (browser_get_platform()=='Win') {
return true;
} else {
return false;
}
}
function browser_is_ie() {
if (browser_get_agent()=='IE') {
return true;
} else {
return false;
}
}
function browser_is_netscape() {
if (browser_get_agent()=='MOZILLA') {
return true;
} else {
return false;
}
}
/*
Determine browser and version
*/
if (ereg( 'MSIE ([0-9].[0-9]{1,2})',$HTTP_USER_AGENT,$log_version)) {
$BROWSER_VER=$log_version[1];
$BROWSER_AGENT='IE';
} elseif (ereg( 'Opera ([0-9].[0-9]{1,2})',$HTTP_USER_AGENT,$log_version)) {
$BROWSER_VER=$log_version[1];
$BROWSER_AGENT='OPERA';
} elseif (ereg( 'Mozilla/([0-9].[0-9]{1,2})',$HTTP_USER_AGENT,$log_version)) {
$BROWSER_VER=$log_version[1];
$BROWSER_AGENT='MOZILLA';
} else {
$BROWSER_VER=0;
$BROWSER_AGENT='OTHER';
}
/*
Determine platform
*/
if (strstr($HTTP_USER_AGENT,'Win')) {
$BROWSER_PLATFORM='Win';
} else if (strstr($HTTP_USER_AGENT,'Mac')) {
$BROWSER_PLATFORM='Mac';
} else if (strstr($HTTP_USER_AGENT,'Linux')) {
$BROWSER_PLATFORM='Linux';
} else if (strstr($HTTP_USER_AGENT,'Unix')) {
$BROWSER_PLATFORM='Unix';
} else {
$BROWSER_PLATFORM='Other';
}
/*
//debug code
echo "\n\nAgent: $HTTP_USER_AGENT";
echo "\nIE: ".browser_is_ie();
echo "\nMac: ".browser_is_mac();
echo "\nWindows: ".browser_is_windows();
echo "\nPlatform: ".browser_get_platform();
echo "\nVersion: ".browser_get_version();
echo "\nAgent: ".browser_get_agent();
*/
?>
Now you should have most of the code you need to get started. There's a lot
more you can do with this system, especially if you use it to track ad banners,
clickins and clickouts, and multiple web sites. As your site scales, you may want
to crunch down the activity_log table once a day and store the results in a separate
table.
Here is an attachment file containing the browser
detection and database abstraction libraries.
Happy logging. Be sure to post your enhancements to the comment forms below.
--Tim