I've been wanting to write this article for a long time, but
never really had the time to do it right. So rather than say
this is going to be a how-to, I'm hoping that this article
will be the seed of discussion about how to build great, scalable
web applications.
Certainly over the past 2-3 years, my web development skills have
changed dramatically. I look back at the source code for Geocrawler
and can't believe I wrote that. The source code for PHPBuilder is
also far from exemplary, as it's really just a hodge-podge of
various GPL'd software packages cobbled together.
But it's not perfect. If I had it to do again, I would try to
make more of a point of keeping the HTML layer more clearly
separated from the database layer, either through objects or
a cleaner function library.
I've found that managers love to have pretty pictures and diagrams
drawn up for them, so here's one that will impress the best of them.
The idea behind this structure is that you are separating your logic
from the "presentation", meaning anything complicated is going on
down there in the "API/Data Access Layer".
Rather than coding
security checks, update statements, etc, etc throughout your HTML
layer, you should theoretically code the bulk of that in the API layer.
The HTML layer will then make simple function calls that return
either arrays, objects, or (my favorite) database result sets.
If you do this right, the topmost layer will be very thin so you
can easily create/maintain it.
Drawing from this example, the HTML interface depends on some direct
calls to the API layer, some calls to an HTML utility library (which
could, for example, generate pop-up boxes, or whatever), and those
libraries make calls to the database using a database abstraction
layer (again so you aren't tied to any particular database).
The Basics
The fundamentals of a smart architecture include:
- Database independence
- Presentation independence
- Portability
- Object-Oriented or at least broken into function call libraries
- What else?
There are certainly more items for that list, but those are the
biggest points I can think of. Maybe you can point out others.
Let's examine each of those in detail.
1. Database Independence.
Well you never know where your site is
going. Certainly when you build it, you hope it's going to get
huge and highly-trafficked. So with that in mind, you
don't want to tie yourself irreparably to MS Access or some other
cheesy, lightweight database. You will never be able to instantly
plug in different databases, but you can make the transition as
smooth as possible.
There are several different options that have cropped up to help
you abstract your database calls. One of the odd things about PHP
is that you have to code for a specific database, as all the function
calls are different for each database. To get around this, you can
use the a database abstraction layer, like those found in
PHPLib,
the forthcoming
PEAR, and the
simple library
we developed for SourceForge.
2. Presentation Independence
Once again, you don't really know
where your web site is headed or where technology is headed. I was
never a big believer in this - HTML is really the standard, especially
in web apps. If that ever changes, I figured I could always rewrite.
But if you get to where you have a truly huge, complex app, then
you need to start thinking about alternative interfaces to your
database. What you don't want to do is start copying and pasting
logic, permission checks, etc around your site. Let's say you need to
make your site WAP-enabled so cell-phone users can surf. If you
designed your app right, you can just write a thin WAP presentation
layer that calls all your data-access objects. If you didn't design
your app right, suddenly you have to maintain both an HTML version
of your site and a WAP version.
I'm running into this on SourceForge. We have this huge user
base of people who want to submit/fetch their bugs, tasks, etc.
At first, we figured it would all be through our web interface.
Then with some pressure from people like Eric Raymond and others,
we've decided to expose the database using an
XML interface.
Fortunately, we undertook an effort back in April to separate out
the core logic of the site from its presentation. I'll try to explain
how we did this, and hopefully others will chime in with their own
methods.
The bug tracker and other tools on SourceForge are now split into
two distinct libraries - the HTML library and the data library.
The data library checks to make sure the right values were passed in,
handles security checks, and basically returns only true or false on
success/failure.
For simplicity's sake, this example will not be based on a perfect
object model, as I'd have to explain the base classes and how the
other objects extend those base objects. I think this example will still
give you the general idea.
Example HTML Lib
<?php
//connect to database
require ("database.php");
//common utils like header/footer HTML
require ("html.php");
//data access library
require ("bug_data.php");
echo site_header("Page Title");
echo "<H4>Updating A Bug</H4>
<P>";
if (bug_data_update($field1,$field2,$field3)) {
echo "<H3>Update Failed!</H3>";
} else {
echo "<H3>Updated Bug Successfully</H3>";
//echo the global error string
echo $feedback;
}
echo site_footer();
?>
Example Data Access Lib
<?php
/**
*
* controls access to updating a bug in the
* database. Validates data and checks security
* Returns true on success, false on failure
*
*/
function bug_data_update ($field1,$field2,$field3) {
//global string to report back errors
global $feedback;
//$field1 and $field2 are required
if (!$field1 || !$field2) {
$feedback="Field 1 And Field 2 Are Required";
return false;
}
//make sure this user has permission to update
if (!user_isadmin()) {
$feedback="You Must Be An Admin To Update a Bug";
return false;
}
//now let's update the bug
$result=db_query("UPDATE bug ".
"SET field2='$field2',".
"field3='$field3' ".
"WHERE id='$field1'");
//now check your query for success/failure
if (!$result) {
//update failed
return false;
} else {
return true;
}
}
?>
3. Portability
Certainly you don't want to hard-code absolute urls throughout your
applicatiopn, but I'm going to take it further and say that
color picks, element names, fonts, and perhaps other options, should
be set in a config file which is included on every page. The broad
look and feel of your site should be separated out as well - obviously
you would not copy an HTML file and paste it all over the place - rather
I tend to wrap that HTML in a function call and then call the function
wherever needed.
Same goes for database password, database connection strings, etc - those
should be in the db abstraction layer.
4. Object Orientation / Functionalization
We're not working in COBOL here - god help you if you are. So that means
we can break up processes into function calls. Each call does an atomic
action, sometimes just calling a handful of other functions and returning
the result.
A good example is checking whether a user is logged in on each page. You
could check for a cookie and query the database easily, but what if you
want to change your auth system? You'd have to go through every page of
code and change it, rather than changing one common function call in a
library. Think about that anytime you write a piece of code - if it's something
that is going to happen more than once on your site, you should move it
into a library.
What else?
There are certainly a lot of things I'm not thinking of, so pass along
your ideas and I'll post a follow up story in the future. In particular,
if you have written a large, complex application, I'd like to hear how
you architected it and what you'd do different if you had to start over
again.
--Tim