picture of Tim Perdue
My July Article comparing PostgreSQL and MySQL caused a stir that really surprised me. Of course, users of both databases called the tests rigged one way or the other, even though I noted that both databases had their strong points under various circumstances.
All that aside, we've been running into some rather grim and serious MySQL problems on SourceForge - most of which are directly related to MySQL's table-level locking mechanism. To give us some breathing room, we've been running a Beta version of MySQL with pretty good success.
Those problems and other MySQL limitations have caused me to evaluate database alternatives, first Interbase (apparently a dead end), then PostgreSQL.
It didn't take long to rule out Interbase after discovering that it lacked replication, full text indexing, and had serious community and corporate issues around its "Open Sourceness".
Postgres also lacks replication and full text indexing (MySQL's latest betas include both features), however the postgres people issued a press release about replication (currently vaporware) and it will be coming soon.
I've worked with Postgres quite a bit in the past and had occasional but serious problems with reliability under some circumstances (Geocrawler.com runs on Postgres and had a tendency to melt down every few months, losing some data in its 10GB database). So I began evaluation of Postgres 7.1 with some trepidation. My benchmarks in July showed PG 7.0 to be clearly slower than MySQL and I outlined several limitations that I found annoying. Still, Postgres has dozens of advanced features that MySQL lacks (triggers, foreign keys, rules, subselects, views...) and was well worth investigating.
As it turns out, the PostgreSQL team has been working incredibly hard on the 7.1 release. Not only did they solve my pet peeve (the 8k row limit), but apparently they've made massive strides in performance and reliability. My install of Postgres 7.02 on Geocrawler has not failed in several months now, in fact Geocrawler recently was noted as one of the most reliable sites on the internet according to Netcraft's monthly survey. So I think that says a lot about the PG 7.x series.
With that in mind, I decided to test out a full port of SourceForge.net to Postgres. The site was written with a database abstraction layer and it turned out to be a cinch to get it up and running on Postgres, including a full import of all production data from MySQL.
Not only did the site come up on the first attempt, but it ran fine! In fact, our very first benchmarks showed Postgres running 6x faster than MySQL on a very database-intensive page (the "My Personal Page" for logged-in users).
To make things interesting, we ran the exact same code base on MySQL 3.23.26beta and Postgres 7.1 (CVS pre-beta version). The SQL structures were identical, except that I had added Foreign Key constraints to the database schema on Postgres (MySQL doesn't really support foreign keys). This should have tilted the performance in favor of MySQL, because MySQL doesn't do any data validation on inserts/updates, but Postgres did in this test.


Now let's be clear - these tests were run with all processes on the same quad-xeon 400 MHz machine with 1GB RAM. The apache "AB" client, Apache/PHP, and the database all ran on the same box. In a real-world scenario, you would be wise to separate your web server and database server if you want maximum performance.
Further, the "worst" database-intensive pages were chosen. It wouldn't make much sense to choose pages that barely touch the database. So most pages on sourceforge are not going to be this slow and saturate the database this much.
The data in both databases was identical and was real life production data.

Pleasantly Surprised

With 30 clients pounding the test machine, Postgres chugged along at 3.76 pages per second. That sounded really bad at first, until we ran the same test on MySQL. MySQL did so poorly that we eventually cancelled the test and reduced the concurrency to just 5 users, and it still only fared .77 pages per second.
To be clear, this is an unusually intense page and requires 16 queries and joins a dozen tables together in interesting ways. If you're wise, you won't make your entire web application this complex.
To spice up this test, we created a second PHP page that did inserts, updates, deletes, and used transactions. We had 30 clients hit this new page, and 30 clients hit the "My Personal Page" simultaneously. Since Postgres supports transactions, we decided to have 25% of the transactions "rollback" to see if that causes any performance problems.
To be fair, this is a test where we fully expected MySQL would fail - because of its table-level locking. The "My Personal Page" joins several times against our "Groups" table, which was being updated frequently in this test. While it was being updated, of course MySQL would have to wait to get a table-level lock, while PostgreSQL would simply move along using its "better than row level" locking.
Postgres chugged along at 2.05 pages/second, and MySQL simply failed the test and locked itself up (again table-level locking is the major pitfall of using MySQL). MySQL didn't crash, but our benchmarking software (Apache's "ab" utility) timed out when it didn't get any result from MySQL.
The numbers here are telling. MySQL was very slow even at 5 concurrent users, and failed at 15 concurrent users (the graph actually flatters MySQL's performance here).
Read/Update Concurrency Test graph

The Bug Tracker Test

Like most bug trackers, ours has suffered from feature bloat over time, and requires about 16 queries to display the browse page.
None of the queries is particularly difficult - except one of them requires a double join against the "users" table to get the name of the submittor and who it is assigned to.
Frankly, I expected that MySQL would easily win this test - selects and joins are its bread and butter. So again, I was pleasantly surprised to see that Postgres won this test handily.
Bug Browser Test graph

The Forum Summary Test

This was another simple test that I again expected MySQL would win handily. It involves basically joining two tables and grouping to get a count of matching items in the second table. Nothing mind boggling.
Performance of count(*)'s have always been a dog on Postgres in the past - so I was really really surprised by this test, and some others I ran. Clearly, this is an area where Postgres has recently done some optimization work.
Forum Summary Test graph

What Does It Mean?

Well, it means you can't simply cross PostgreSQL off your list because it's a dog anymore. Now, not only does Postgres have dozens of very advanced features that MySQL lacks, but it doesn't appear to be weighed down by those features as it was in the past. And that's good for everyone.
Expect to read more about Postgres here in the future - most notably I'm going to write up a more-detailed article on how to properly use transactions and foreign key constraints in your database. Both features are going to be used throughout the SourceForge code base in the near future.