Logging is an old subject. It's simple just record the page requested,
time, IP number and maybe the refering URL. Instantly you can draw up
graphs on the most popular pages, the best referers etc. This in its own
right is very useful. However, when I was looking at the logs on one of
my larger sites, I thought - I know that page "x" gets so many hits a
day, but the user could have come to that page using a variety of different
routes. How do users get to page "x"? On some sites there is a variety of
different ways to reach the same page. On PHPbuilder for example, most
people probably see the list of new articles and choose one of those. How
many people go to the columns page and look at all the articles? This isn't
the best example but I think you get the idea. If you know this information
you can see how effective your navigation system really is.
This idea can be implemented in a very simple way to begin with. By using
the refering URL you can get some insight into how they got to a page. I keep
my own logs as described in this
article. I don't filter out the query string because I want to know exactly
where users have looked, this means that if someone came from a search engine
I can see the keywords they entered to find my site and if they enter a page
that is only available through a query string I can see that too. I use a
mySQL table like this to keep all my logging information in :
Then I can use a query like this for some basic route tracing :
SELECT count(refering_page) AS hits,refering_page FROM logging WHERE page="pagex" GROUP BY refering_page ORDER BY hits
Instantly an ordered list of how most people get to a page. Now this is all well and good I hear you say but thats hardly a route!
I use sessions on my site. Anyone looking at a page is instantly given a session ID and this is passed from page to page.Sessions are timed-out after 300 seconds. Now by storing this in the logs as well you suddenly have the route every user took when looking at the site. So my logging table now looks like this :
By adding this one bit of data we suddenly have a whole new way of looking at the data. Before we use this data we need to understand better how it is stored.
A user enters the site.
They are given a session ID
The logging table is updated ( with timestamp, remote_ip,page,refering_page and session ID)
The user clicks a link
A different page is displayed.
Back to step 3
Now what I do is this, but I'm not saying it is the best way :
Select a timestamp start and end ( eg beggining of the day and end of the day)
Select the page you want to see how users got to. We will call this "pagex"
Select all session IDs from the logging table where page=pagex and timestamp>timestamp_start and timestamp<timestamp_end.
We now have a list of session IDs of people that looked at pagex.
Then for each session ID - we want to get the pages that they looked at where the timestamp is before that of them looking at pagex.
Order this information by the timestamp and that is the users route.
This unfortunatly does give us their whole route around the site before getting to pagex. I overcome this by grouping the pages that they have looked at and ordering by the most popular in a kind of backstepping way.eg look at one page back, two pages back, three pages back etc
There are many more ways of looking at this data - I'm not sure what they all are! - but I know that they are there. By looking at this data it becomes apparent that some links you have put in place are never used. Or certain pages are accessed from other sites more than they are from your own site. One good use is to see how spiders/bots navigate your site. If used correctly you will have a better understanding of how users navigate your site, where the obvious links are and where there should be links.