Which database do I use: Postgres or MySQL? This age-old question has plagued
developers for, what, at least a couple years now. I've used both databases
extensively (MySQL for about one year and Postgres for about 2 years) and was
curious if the performance differences between the two were as stark as the
MySQL website suggests.
I had actually benchmarked the two databases back in September 1999 when we
were starting to lay the groundwork for SourceForge. At the time, the
performance difference was so stark that we had to go with MySQL even though I
had always used Postgres for all my work. The rest of the developers were used
to MySQL and that pretty much cinched the decision.
This time around, rather than using some contrived benchmarking scheme, I
wanted to grab a "real life" web page from a "real" web site and see how it
performed on the two databases. The page in question was the discussion forum
on SourceForge. It involves some relatively straightforward joins of three
tables, each with 20-30,000 rows of data. It also involves some recursive
queries to show nested messages, so the database is the true bottleneck on this
page, not PHP.
To get started, I dumped real data out of the production database, modified the
table SQL and imported it all into MySQL 3.22.30 and PostgreSQL 7.0.2 on Red
Hat Linux 6.2 and a VA Linux quad-xeon 4100 server with 1GB RAM.
The first problem I ran into was that Postgres has an arcane limit of 8k of data
per row. In a message board, you're going to occasionally surpass 8k of data
in a row, and so postgres choked on the import. To get around this, I just
dropped out the "body" of the message and re-imported the data. The Postgres
development team is aware of this limitation and are fixing it in v7.1, and
they also noted that you can recompile Postgres to support up to 32k,
although at a possible detriment to overall performance.
At this point, I ran into another small issue with Postgres - its "serial" data type (the
equivalent of MySQL's auto_increment) creates a "sequence" which does not get
dropped when its parent table is dropped. So if you try to re-create the table,
you'll get a name conflict for this sequence. A lot of new users would be
confused by this, so Postgres loses a couple points for that. Also, MySQL is
"smart" enough to increment its auto_increment value when you import data,
whereas Postgres' sequence does not get reset when you import data, causing all
new inserts to fail.