In a previous job when I was responsible for hiring PHP developers, I always used to ask questions about database abstraction in my interviews. It's amazing how often candidates with the best-looking CV's, and impressive looking project lists, were tripped up by their lack of knowledge of what a database abstraction layer is, and were unfamiliar with any of the main database abstraction layers.
I can't really blame them. It's not obvious to newbies learning PHP development from the PHP website, many books, or other online resources, that abstraction layers even exist, or why one should use them. To many who've put in the time of learning the PHP functions, suddenly being faced with a whole new set of functions that don't (on the surface) provide much benefit can be daunting.
Needless to say, none of the candidates who were clueless about abstraction layers got the job. But, to make things easier for others in a similar situation, this month's article covers reasons for using abstraction layers, and takes a look at some of the more popular ones in this rapidily changing field.
Why use a Database Abstraction Layer
Let's start with the reason you'll usually hear trumpeted from the rooftops. It allows one to switch from one DBMS to another with a minimum of pain. That's not entirely true, and certainly not the main reason for using an abstraction layer. Using the PHP MySQL functions means that one would have to change each instance inside your code. If your code is neat, you shouldn't have these littered everywhere, but there still may be more to change than if you'd used an abstraction layer. However, it's highly unlikely that you'll be able to switch one line of code, switching mysql to oracle, for example, inside a config file. There're usually at least some queries that use features unique to a particular DBMS, or that differ from the alternative you're switching to, and there is also likely to be performance tuning unique to that DBMS. There's no other way to migrate these than to manually go through and rewrite each one.
Some claim that developers should never use non-standard SQL for this reason, but that's an unreasonable restriction. I was once involved in a highly painful project to move from MySQL to Informix. The code had been written with a degree of database abstraction (PHPLib), and it was still a painful process. Luckily I managed to convince everyone that MySQL hadn't been properly optimized, and was in fact suitable for the task. Nevertheless, using a database abstraction layer does help portability, just don't make the mistake of thinking it will be easy to switch to another DBMS. The choice of a DBMS should never be lightly taken, and once committed to, not easily backed away from.
The other advantage usually cited is that, by its very nature, abstraction simplifies a complex task. So, instead of having to learn an entire new set of functions, one can simply use the standard abstraction layer's code. However, this too is an overstated advantage. A competent developer should be comfortable with all the PHP database functions - they really don't differ that much. The many abstraction layers however do differ from each other quite substantially. Familiarity, though, is an advantage - it just makes a developer that much more efficient. Java developers coding in PHP may be more efficient using Creole (based on JDBC), as could Microsoft developers using ADOdb. Developers working in an environment with multiple DBMS' will appreciate the single API, and it undoubtedly improved developer productivity.
So there are advantages, but what about disadvantages? Speed is the main one. An abstraction layer is an extra layer of code, and this inevitably comes with a performance penalty. Some are slower than others - there are benchmarks out there (such as http://marc.theaimsgroup.com/?l=pear-dev&m=108239153527834&w=2 http://freshmeat.net/screenshots/30313/
), but remember that the underlying code is always developing, which could affect performance, and you also need to read between the lines, and understand exactly what is being benchmarked, and why.
Metabase was one of the slowest abstraction layers, mainly because it was designed with maximum portability in mind. PDO, written in C and accessing the native API's, is one of the fastest. If you're concerned about performance, the best way is to perform your own benchmarks, set up to simulate your environment and needs as closely as possible.
Popular PHP database abstraction layers
There are three granddaddies out there: Metabase, PEAR:DB and AdoDB. Most code out there at the moment makes use of one of these, with PEAR being the most popular (being bundled with PHP), ADOdb perhaps next, followed by Metabase. However, there are some new ones worth looking at. PEAR:DB and Metabase are in the process of being superceded with other abstraction layers, PEAR:MDB being a merge of PEAR:DB and Metabase, and apparently having a better performance than both. Let's look at some of these.
In existence since 2001, part of the PEAR repository, and bundled with PHP, PEAR:DB is for these reasons the most popular database abstraction library. PEAR stands for PHP Extension and Application Repository, and describes itself as a framework and distribution system for reusable PHP components. It's not known as particularly zippy, and is also being usurped by more recent and better performing abstaction layers. PEAR:DB supports:
- MS SQL
PEAR:DB can be downloaded at http://pear.php.net/package/DB/download
. Read more about it at http://pear.php.net/package/DB
. There's also an old tutorial available at http://phpbuilder.com/columns/allan20010115.php3?aid=202
. PEAR:DB requires at least PHP 4.
In 2003 I wrote an introductory tutorial to ADOdb (part 1
and part 2
). Of course a lot has changed since then, but ADOdb is still a popular abstraction layer. It's particularly easy to learn for Microsoft developers, as it uses many of the ADO conventions. It's been around since 2000, and is well-supported, and is also used by numerous popular web applications, such as Mambo and eGroupWare.
ADOdb has quite a few methods that go beyond what's normally in an abstraction layer, such as formatting tables and caching and it supports the most DBMS'. The list currently includes:
- MS SQL
- SAP DB