My
July Article comparing PostgreSQL and MySQL
caused a stir that really surprised me. Of course, users of both databases
called the tests rigged one way or the other, even though I noted that
both databases had their strong points under various circumstances.
All that aside, we've been running into some rather grim and serious
MySQL problems on
SourceForge -
most of which are directly related to MySQL's table-level locking mechanism.
To give us some breathing room, we've been running a Beta version of MySQL
with pretty good success.
Those problems and other MySQL limitations have caused me to evaluate
database alternatives, first Interbase (apparently a dead end), then PostgreSQL.
It didn't take long to rule out Interbase after discovering that it
lacked replication, full text indexing, and had serious community
and corporate issues around its "Open Sourceness".
Postgres also lacks replication and full text indexing (MySQL's latest
betas include both features), however the postgres people issued a press
release about replication (currently vaporware) and it will be coming
soon.
I've worked with Postgres quite a bit in the past and had occasional but
serious problems with reliability under some circumstances (Geocrawler.com
runs on Postgres and had a tendency to melt down every few months, losing
some data in its 10GB database). So I began evaluation of Postgres 7.1 with
some trepidation. My benchmarks in July showed PG 7.0 to be clearly slower
than MySQL and I outlined several limitations that I found annoying. Still,
Postgres has dozens of advanced features that MySQL lacks (triggers, foreign
keys, rules, subselects, views...) and was well worth investigating.
As it turns out, the PostgreSQL team has been working incredibly hard
on the 7.1 release. Not only did they solve my pet peeve (the 8k row limit), but
apparently they've made massive strides in performance and reliability.
My install of Postgres 7.02 on Geocrawler has not failed in several months now,
in fact Geocrawler recently was noted as one of the most reliable sites on the internet
according to Netcraft's monthly survey. So I think that says a lot about the PG 7.x
series.
With that in mind, I decided to test out a full port of SourceForge.net
to Postgres. The site was written with a database abstraction layer and it
turned out to be a cinch to get it up and running on Postgres, including
a full import of all production data from MySQL.
Not only did the site come up on the first attempt, but it ran fine! In fact,
our very first benchmarks showed Postgres running 6x faster than MySQL on a
very database-intensive page (the
"My Personal Page" for logged-in users).
To make things interesting, we ran the exact same
code base
on MySQL 3.23.26beta
and Postgres 7.1 (CVS pre-beta version). The SQL structures were identical,
except that I had added Foreign Key constraints to the database schema on
Postgres (MySQL doesn't really support foreign keys). This should have tilted
the performance in favor of MySQL, because MySQL doesn't do any data validation
on inserts/updates, but Postgres did in this test.
Methodology
Now let's be clear - these tests were run with all processes on the same quad-xeon
400 MHz machine with 1GB RAM. The apache "AB" client, Apache/PHP, and the database
all ran on the same box. In a real-world scenario, you would be wise to
separate your web server and database server if you want maximum performance.
Further, the "worst" database-intensive pages were chosen. It wouldn't make
much sense to choose pages that barely touch the database. So most pages
on sourceforge are not going to be this slow and saturate the database
this much.
The data in both databases was identical and was real life production data.
Pleasantly Surprised
With 30 clients pounding the test machine, Postgres chugged along at 3.76
pages per second. That sounded really bad at first, until we ran the same test
on MySQL. MySQL did so poorly that we eventually cancelled the test and reduced
the concurrency to just 5 users, and it still only fared .77 pages per second.
To be clear, this is an unusually intense page and requires 16 queries and
joins a dozen tables together in interesting ways. If you're wise, you won't
make your entire web application this complex.
To spice up this test, we created a second PHP page that did inserts, updates,
deletes, and used transactions. We had 30 clients hit this
new page, and 30 clients hit the "My Personal Page" simultaneously. Since
Postgres supports transactions, we decided to have 25% of the transactions
"rollback" to see if that causes any performance problems.
To be fair, this is a test where we fully expected MySQL would fail - because
of its table-level locking. The "My Personal Page" joins several times against
our "Groups" table, which was being updated frequently in this test. While it
was being updated, of course MySQL would have to wait to get a table-level lock,
while PostgreSQL would simply move along using its "better than row level" locking.
Postgres chugged along at 2.05 pages/second, and MySQL simply failed the
test and locked itself up (again table-level locking is the major pitfall
of using MySQL). MySQL didn't crash, but our benchmarking software (Apache's "ab" utility)
timed out when it didn't get any result from MySQL.
The numbers here are telling. MySQL was very slow even at 5 concurrent users,
and failed at 15 concurrent users (the graph actually flatters MySQL's performance
here).
The Bug Tracker Test
Like most bug trackers, ours has suffered from feature bloat over time, and requires
about 16 queries to display the
browse page.
None of the queries is particularly difficult - except one of them requires a double
join against the "users" table to get the name of the submittor and who it is assigned to.
Frankly, I expected that MySQL would easily win this test - selects and joins are its
bread and butter. So again, I was pleasantly surprised to see that Postgres won this
test handily.
The Forum Summary Test
This was another simple test that I again expected MySQL would win handily.
It involves basically joining two tables and grouping to get a count of matching
items in the second table. Nothing mind boggling.
Performance of count(*)'s have always been a dog on Postgres in the past - so I was
really really surprised by this test, and some others I ran. Clearly, this is an
area where Postgres has recently done some optimization work.
What Does It Mean?
Well, it means you can't simply cross PostgreSQL off your list because it's a dog
anymore. Now, not only does Postgres have dozens of very advanced features that MySQL
lacks, but it doesn't appear to be weighed down by those features as it was in the
past. And that's good for everyone.
Expect to read more about Postgres here in the future - most notably I'm going to
write up a more-detailed article on how to properly use transactions and foreign
key constraints in your database. Both features are going to be used throughout
the SourceForge code base in the near future.
--Tim