Which database do I use: Postgres or MySQL? This age-old question has plagued
developers for, what, at least a couple years now. I've used both databases
extensively (MySQL for about one year and Postgres for about 2 years) and was
curious if the performance differences between the two were as stark as the
MySQL website suggests.
I had actually benchmarked the two databases back in September 1999 when we
were starting to lay the groundwork for SourceForge. At the time, the
performance difference was so stark that we had to go with MySQL even though I
had always used Postgres for all my work. The rest of the developers were used
to MySQL and that pretty much cinched the decision.
This time around, rather than using some contrived benchmarking scheme, I
wanted to grab a "real life" web page from a "real" web site and see how it
performed on the two databases. The page in question was the discussion forum
on SourceForge. It involves some relatively straightforward joins of three
tables, each with 20-30,000 rows of data. It also involves some recursive
queries to show nested messages, so the database is the true bottleneck on this
page, not PHP.
To get started, I dumped real data out of the production database, modified the
table SQL and imported it all into MySQL 3.22.30 and PostgreSQL 7.0.2 on Red
Hat Linux 6.2 and a VA Linux quad-xeon 4100 server with 1GB RAM.
The first problem I ran into was that Postgres has an arcane limit of 8k of data
per row. In a message board, you're going to occasionally surpass 8k of data
in a row, and so postgres choked on the import. To get around this, I just
dropped out the "body" of the message and re-imported the data. The Postgres
development team is aware of this limitation and are fixing it in v7.1, and
they also noted that you can recompile Postgres to support up to 32k,
although at a possible detriment to overall performance.
At this point, I ran into another small issue with Postgres - its "serial" data type (the
equivalent of MySQL's auto_increment) creates a "sequence" which does not get
dropped when its parent table is dropped. So if you try to re-create the table,
you'll get a name conflict for this sequence. A lot of new users would be
confused by this, so Postgres loses a couple points for that. Also, MySQL is
"smart" enough to increment its auto_increment value when you import data,
whereas Postgres' sequence does not get reset when you import data, causing all
new inserts to fail.
Methodology
To try to make this as realistic as possible, I took an actual page from a
website and made it portable across both MySQL and Postgres. This basically
meant replacing all mysql_query() calls with pg_exec(). This page involves
a lot of selects and joins, as probably most pages on a typical website do.
Once the test page was up and debugged, I then ran "ab", the "Apache Benchmarking"
utility, from my workstation across my 100-mbit LAN to the quad-xeon machine. To
get an idea of scalability under load, I varied the "concurrent connections"
on ab from 10-120, while leaving the number of page views steady at 1000.
To more closely simulate real-world use, I set up a random-number generator in
the script that inserts a row into the database on 10% of the page views. My
own numbers on PHPBuilder show that about 10% of all pages in the discussion
forums are for posting new messages.
Further, as mentioned above, I used real data from a production database. You
can't get a whole lot more realistic than this scenario.
The Numbers
The most interesting thing about my test results was to see how much of a load
Postgres could withstand before giving any errors. In fact, Postgres seemed to
scale 3 times higher than MySQL before giving any errors at all. MySQL
begins collapsing at about 40-50 concurrent connections, whereas Postgres
handily scaled to 120 before balking. My guess is, that Postgres could have gone
far past 120 connections with enough memory and CPU.
On the surface, this can appear to be a huge win for Postgres, but if you
look at the results in more detail, you'll see that Postgres took up to 2-3
times longer to generate each page, so it needs to scale 2-3 times higher
just to break even with MySQL. So in terms of max numbers of pages generated
concurrently without giving errors, it's pretty much a dead heat between
the two databases. In terms of generating one page at a time, MySQL does it
up to 2-3 times faster.
Another interesting point was that MySQL crumbles faster in the "10%
insert" test described above. Research reveals that MySQL locks the entire
table when an insert occurs, while Postgres has a pretty nifty "better than
row-level locking" feature. This difference quickly causes MySQL to pile up concurrent
connections and thus collapse. The same is true if you are doing a large select
out of a database while another process is inserting into that table. Postgres
is completely unfazed, while MySQL piles up connections until it falls apart
like a house of cards.
For those of you wondering about persistent connections in PHP, they don't
appear to benefit MySQL that much, whereas they are a clear boon for Postgres.
In fact, Postgres benchmarked as much as 30% faster just by using persistent
connections. That tells me that Postgres has a tremendous amount of overhead in
its connection-opening and authentication process. Some of this may be the
fault of Linux and its relatively lame process scheduler. Still, MySQL on the
same box beat it handily no matter how you look at it.
MySQL
The numbers for MySQL ring true with what most people already know: it's a
fast, although lightweight database that will probably serve well for the vast
majority of web sites. However, if you plan on having a high-traffic site (say,
greater than 500,000 pages per day), then forget MySQL as it can tend to fold
up and die under load. Anyone who has ever visited slashdot can attest to the
fragility of its setup (mod_perl and MySQL).
But again, the vast, vast majority of web sites fall well under the 15-pages
per second demonstrated by MySQL here. If you ever surpass a sustained 15
pages per second,
you'll be delighted to fork over the cash for a bigger server or an Oracle
license.
Wins
Obviously, the advantage MySQL has over Postgres is performance. It also has
some more powerful admin tools included in the distribution (mysqladmin allows
you to watch processes and queries in-progress), like hot backup, a file
corruption recovery tool and a couple others.
I'm also a fan of MySQL's command-line tools. You can see database and table
structures using describe and show commands. Postgres' commands are less
obvious ( \d to show a list of tables for instance).
Limitations
The first thing you hear from hard-core database gurus is that MySQL lacks
transactions, rollbacks, and subselects. You'll really miss transactions if
you're trying to write a banking application, accounting application, or trying
to maintain some sort of counter that needs to increment linearly over time.
Forget attempting any of those with released versions of MySQL (it should be
noted that the unstable 3.23.x series of MySQL now includes transaction
support).
For many, if not most, web sites out there, MySQL's limitations can be overcome
with a little elbow grease on the part of the developer. The primary feature
you'll miss in MySQL is powerful subselect syntax that is present in almost
every other production database. If I had a nickle for every time I could've
used subselects in MySQL, I'd be able to buy a case or two of beer. In other
words, this missing feature can be a pain in the neck, but it can be overcome.
Stability
MySQL loses points in the long-term stability department. Simply put, MySQL
gives up the ghost randomly and for no obvious reason after running for
semi-long periods of time (say 30-60 days). Many developers will compile MySQL
"statically" for just that reason, and doing so has helped some people.
That problem again can be overcome with a good pager or a simple crontab entry
that kills and restarts MySQL monthly. Not that I find that at all acceptable,
but it is a solution.
Where MySQL loses points in the daemon robustness department, it makes up for
it by apparently never corrupting its data files. The last thing you want is
your precious data files fouled randomly, and MySQL does well here. In over a
year of running MySQL, I haven't ever seen a single case of database
or index corruption. In the same timeframe, I have done 2 or 3 recoveries of a
couple different Postgres databases. (Regardless, backups are always your
best friend, as shown by the database fiasco here on PHPBuilder.)
PostgreSQL
The results for Postgres might surprise a few people, as Postgres has somewhat
of a negative reputation among some web developers (initial releases
of Postgres had widely-rumored issues in addition to laggard performance).
According to my experience, and these benchmarks, most of that reputation is
unfounded. In fact, it appears that PostgreSQL withstands up to 3 times the
load that MySQL can before throwing any errors -- on the same hardware/OS
combination.
Postgres happily chugs along at roughly 10 pages/second, enough to serve about
400,000 pages/day, assuming a regular traffic curve with the peak at 2x the
bottom. That's an awful lot of pages and is far beyond what most people will
see on their websites. In addition, most of the pages on your site will
not be as complex as the one in this test. As with MySQL, you'll be happy
to pay for a hardware upgrade if you pass this ceiling. Because of Postgres'
architecture, it could probably continue to scale up the more processors and
RAM you give it.
Wins
Well, postgres has some extremely advances features when shown next to MySQL.
While I don't use most of the features myself, they are available for the
truly-hardcore developers out there. Many developers don't even realize what
they're missing by not having some of these features available.
An example of where you should be using a transaction is if you are doing
more than one update/insert/delete in a sequence. For instance, your script
inserts a new user into your user table, then also inserts a row in another
table, and you update a flag somewhere else. In this case, if the first insert
succeeds, but the second fails, what do you do? With Postgres, you could
Rollback the entire operation and show an appropriate error. With MySQL,
you would wind up in an invalid state, unless you program in a bunch of
logic to handle the situation. In real-world use, most queries don't fail
unless you're a lousy programmer, and if the second query did fail, the
results may not be dire (unless we're talking about an accounting/banking/critical
application where there can be no risk of incorrect data).
Anyway, foreign-key support is now in Postgres 7.0+, which means that when you insert
a row, the database can do some fairly impressive validation checks. Same if
you delete a row - it just plain won't let you delete a row if another table is
depending on it. I love this idea and can envision rewriting entire websites
just to take advantage of this feature.
Triggers and views are interesting and powerful tools that can be used in
Postgres, but not MySQL. I haven't used either one, but I can think of a
hundred uses for Views if I were to redesign SourceForge from the ground up on
Postgres.
Limitations
The primary limitation with Postgres is not its performance (as most web sites
will never run into that barrier), but hard-coded limits like the
8k row size limit (which probably dates back to its earliest days). When I
designed Geocrawler.com on Postgres, I had to segment large emails into 8k
chunks to work around this lame limitation. Also, by default, Postgres is
compiled to only support 32 connections, which is not enough for a high-traffic
web site, especially when you consider that postgres delivers each page much
more slowly than MySQL.
One other limitation may bug a lot of PHP users - Postgres has no equivalent to
MySQL's mysql_insertid() function call. That is, if you insert a row into a
MySQL database, MySQL will hand you back the primary key ID for that row. There
is an extremely round-about way of doing this in Postgres, but it's a headache
and is probably slow if used a lot.
Stability
Postgres will run smoothly for extended periods of time without trouble. My
Postgres 6.5.3 install has run for 90 days without blinking on my tired old
PowerMac 8500, while getting about 50-100,000 pages per day. And when postgres
gets loaded, it just bogs down, it doesn't quit and give up the ghost under
stress.
The problem with Postgres is that when you do have a problem with it,
it's usually really bad. Like a fubar database file or, more commonly, a
corrupted index (which can frequently be dropped/rebuilt). I have encountered
other serious problems with older versions of postgres (6.4.x) where multiple
same numbers were inserted into a primary key (something that should be
impossible under any circumstance).
There have also been problems with Postgres where you can wind up with
"half-baked" indexes, tables, etc that you cannot drop or get rid of. I have
not seen these yet on Postgres 7, but I haven't used it enough to know.
Conclusion
These tests pretty much confirmed what I already knew - both databases serve
quite well for the vast majority of web sites out there. Both are actually
extremely fast when compared to desktop databases like FileMaker and MS Access.
Both are now free and supported by an active developer community.
To choose between the two databases, you first need to understand your
scalability limits and whether you need the transaction support of Postgres or
the large-text-area support in MySQL. You may need both, in which case you have
to wait for future stable releases of both databases.
It's interesting to note that the two databases appear to be converging to meet
in the middle somewhere. While MySQL is working on adding transaction support
and slowly adding features like subselects, Postgres is making headway in the
performance and stability departments.
Finally, for the hardest-core developers, Postgres could be pretty slick.
Foreign keys, views, subselects, and transactions can all be pretty cool -- if
you need them and you will make any use of them. If you don't need them or
won't use them, then you're probably better off with MySQL and its superior
performance.
--Tim