picture of Tim Perdue
I've been wanting to write this article for a long time, but never really had the time to do it right. So rather than say this is going to be a how-to, I'm hoping that this article will be the seed of discussion about how to build great, scalable web applications.
Certainly over the past 2-3 years, my web development skills have changed dramatically. I look back at the source code for Geocrawler and can't believe I wrote that. The source code for PHPBuilder is also far from exemplary, as it's really just a hodge-podge of various GPL'd software packages cobbled together.
SourceForge was really the first serious app I helped write as an experienced PHP developer and I think it shows to an extent in the end result. The code is pretty much broken up into decent libraries and sensible function calls. The database structure is very clean. The various subsections of the site are generally not dependent on the other parts of the site.
But it's not perfect. If I had it to do again, I would try to make more of a point of keeping the HTML layer more clearly separated from the database layer, either through objects or a cleaner function library.

Pretty Pictures

I've found that managers love to have pretty pictures and diagrams drawn up for them, so here's one that will impress the best of them. The idea behind this structure is that you are separating your logic from the "presentation", meaning anything complicated is going on down there in the "API/Data Access Layer".
Rather than coding security checks, update statements, etc, etc throughout your HTML layer, you should theoretically code the bulk of that in the API layer. The HTML layer will then make simple function calls that return either arrays, objects, or (my favorite) database result sets.
If you do this right, the topmost layer will be very thin so you can easily create/maintain it.
breakdown of the layers of a web-based application
Drawing from this example, the HTML interface depends on some direct calls to the API layer, some calls to an HTML utility library (which could, for example, generate pop-up boxes, or whatever), and those libraries make calls to the database using a database abstraction layer (again so you aren't tied to any particular database).

The Basics

The fundamentals of a smart architecture include:
  1. Database independence
  2. Presentation independence
  3. Portability
  4. Object-Oriented or at least broken into function call libraries
  5. What else?
There are certainly more items for that list, but those are the biggest points I can think of. Maybe you can point out others.
Let's examine each of those in detail.
1. Database Independence.
Well you never know where your site is going. Certainly when you build it, you hope it's going to get huge and highly-trafficked. So with that in mind, you don't want to tie yourself irreparably to MS Access or some other cheesy, lightweight database. You will never be able to instantly plug in different databases, but you can make the transition as smooth as possible.
There are several different options that have cropped up to help you abstract your database calls. One of the odd things about PHP is that you have to code for a specific database, as all the function calls are different for each database. To get around this, you can use the a database abstraction layer, like those found in PHPLib, the forthcoming PEAR, and the simple library we developed for SourceForge.
2. Presentation Independence
Once again, you don't really know where your web site is headed or where technology is headed. I was never a big believer in this - HTML is really the standard, especially in web apps. If that ever changes, I figured I could always rewrite.
But if you get to where you have a truly huge, complex app, then you need to start thinking about alternative interfaces to your database. What you don't want to do is start copying and pasting logic, permission checks, etc around your site. Let's say you need to make your site WAP-enabled so cell-phone users can surf. If you designed your app right, you can just write a thin WAP presentation layer that calls all your data-access objects. If you didn't design your app right, suddenly you have to maintain both an HTML version of your site and a WAP version.
I'm running into this on SourceForge. We have this huge user base of people who want to submit/fetch their bugs, tasks, etc. At first, we figured it would all be through our web interface. Then with some pressure from people like Eric Raymond and others, we've decided to expose the database using an XML interface.
Fortunately, we undertook an effort back in April to separate out the core logic of the site from its presentation. I'll try to explain how we did this, and hopefully others will chime in with their own methods.
The bug tracker and other tools on SourceForge are now split into two distinct libraries - the HTML library and the data library. The data library checks to make sure the right values were passed in, handles security checks, and basically returns only true or false on success/failure.
For simplicity's sake, this example will not be based on a perfect object model, as I'd have to explain the base classes and how the other objects extend those base objects. I think this example will still give you the general idea.
Example HTML Lib


//connect to database
require ("database.php");

//common utils like header/footer HTML
require ("html.php");

//data access library
require ("bug_data.php");

site_header("Page Title");

"<H4>Updating A Bug</H4>

if (
bug_data_update($field1,$field2,$field3)) {

"<H3>Update Failed!</H3>";

} else {

"<H3>Updated Bug Successfully</H3>";
//echo the global error string
echo $feedback;


Example Data Access Lib

 *  controls access to updating a bug in the
 *  database. Validates data and checks security
 *  Returns true on success, false on failure

function bug_data_update ($field1,$field2,$field3) {
//global string to report back errors
global $feedback;

//$field1 and $field2 are required
if (!$field1 || !$field2) {
$feedback="Field 1 And Field 2 Are Required";

//make sure this user has permission to update
if (!user_isadmin()) {
$feedback="You Must Be An Admin To Update a Bug";

//now let's update the bug

$result=db_query("UPDATE bug ".
"SET field2='$field2',".
"field3='$field3' ".
"WHERE id='$field1'");

//now check your query for success/failure
if (!$result) {
//update failed
return false;
    } else {

3. Portability
Certainly you don't want to hard-code absolute urls throughout your applicatiopn, but I'm going to take it further and say that color picks, element names, fonts, and perhaps other options, should be set in a config file which is included on every page. The broad look and feel of your site should be separated out as well - obviously you would not copy an HTML file and paste it all over the place - rather I tend to wrap that HTML in a function call and then call the function wherever needed.
Same goes for database password, database connection strings, etc - those should be in the db abstraction layer.
4. Object Orientation / Functionalization
We're not working in COBOL here - god help you if you are. So that means we can break up processes into function calls. Each call does an atomic action, sometimes just calling a handful of other functions and returning the result.
A good example is checking whether a user is logged in on each page. You could check for a cookie and query the database easily, but what if you want to change your auth system? You'd have to go through every page of code and change it, rather than changing one common function call in a library. Think about that anytime you write a piece of code - if it's something that is going to happen more than once on your site, you should move it into a library.

What else?

There are certainly a lot of things I'm not thinking of, so pass along your ideas and I'll post a follow up story in the future. In particular, if you have written a large, complex application, I'd like to hear how you architected it and what you'd do different if you had to start over again.