Ericson Smith
Following
Tim
Perdue's excellent article on the comparison between MySQL
and Postgresql, I decided to take a shot at installing and using this
database. For most of our work I use MySQL and will continue to do so,
because of its ease of use and unrivaled select query speed, and also
because there is no point in trying to mess around with production
systems that already work fine.
But some new projects suffered greatly from MySQL's table locking
feature when I needed to update data (which I do a lot). Here are my
adventures in setting up a Postgresql database server.
Our configuration for a dedicated Postgresql server was:
- Redhat 7.1
- Dual PIII 650Mhz System
- 512MB RAM
- 18Gig SCSI drive for the postgresql data partition
Downloading and Installing
I downloaded and installed the 7.1.2 RPM's from http://postgres.org
without any trouble. For a server installation, I only installed:
postgresql-server and postgresql-7.1.2 (base).
I then started the server up and running by executing:
/etc/init.d/postgresql start
A small sized database was ported from MySQL (three tables totaling about
5000 records). I created sufficient indexes for postgresql's optimizer
to use, and modified our C application to use the postgresql C client
interface for a small CGI program that would brutally query this table.
This small CGI program receives thousands of queries per minute.
Optimizing
One of the first things I noticed after turning on the CGI program, was
that although queries were returned almost as fast as from the previous
MySQL based system, the load on the server was much higher -- in fact
almost 90-percent! Then I started to go down into the
nitty-gritty of things. I had optimized MySQL before by greatly
increasing cache and buffer sizes and by throwing more ram towards the
problem.
The single biggest thing that you have to do before running Postgresql,
is to provide enough shared buffer space. Let me repeat: provide
enough buffer space! Let's say you have about 512MB of ram on a
dedicated database server, then you need to turn over about 75-percent
of it to this shared buffer. Postgresql does best when it can load most
or -- even better -- all of a table into its shared memory space. In
our case, since our database was fairly small, I decided to allocate
128MB of RAM towards the shared buffer space.
The file /var/lib/pgsql/data/postgresql.conf contains settings for the
database server. Postgresql uses system shared memory as a buffer. On a
Linux system, you can see how much shared memory was allocated by your
system by running the command:
cat /proc/sys/kernel/shmmax
And to view shared memory use on the system:
ipcs
The result will be in bytes. By default RedHat 7.1 allocates 32MB of
shared memory, hardly enough for postgresql. I increased this limit to
128MB by doing the command:
echo 128000000 > /proc/sys/kernel/shmmax
Be aware that once you reboot the server, this setting will disappear.
You need to place this line in your postgresql startup file, or by
editing the /etc/sysctl.conf file for a more permanent setting.
Then in our postgresql.conf I set shared_buffers to 15200. Because
Postgresql uses 8K segments, I made a calculation of 128000/8192 plus a
512K overhead. I also set our sort_mem to 32168 (32Megs for a sort memory
area). Since connection pooling was in effect, I set max_connections to 64.
And fsync was also set to false.
shared_buffers = 15200
sort_mem = 32168
max_connections=64
fsync=false
You can read the manual to tweak other settings, but I never had the
need to do so. Note that if you set shared_buffers to more than what
your shared memory limit is, postgresql will refuse to start. This
confused us for a while, since no logging was taking place. You can
tweak the startup file in /etc/init.d for the postmaster to write its
output to a log file.
Change the fragment from:
/postmaster start > /dev/null 2>
to
/postmaster start > /var/lib/pgsql.log 2>
(or wherever you want to store the log.)
Tailing the log file clearly explained what the problem was.
All sorts of sexy debugging info will show up in this file, which
includes SQL syntax errors, the output of EXPLAIN state, emts, connection
problems, authentication attempts, and so forth.
I restarted postgresql and brought our CGI online. Our jaws collectively
dropped to the floor as postgresql literally flew as soon as it started
to use the buffer. Server load by postgresql dropped to just under
10-percent.
One hitch I found with an early version of the system was that it had to
build up and tear down a postgresql connection with each request. This
was intolerable, so I started to use the connection pooling features of
the C library. Server load dropped another few notches with this option.
With PHP you will want to use persistent connections (pg_pconnect
instead of pg_connect) to fully take advantage of this effect.
Indexes
I cannot emphasize enough the need to have proper indexing in
postgresql. One early mistake that I made was to index BIGINT columns.
The columns were indexed ok, but postgresql refused to make use them.
After two days of tearing out my hair, it came to me that the
architecture of the system was 32 bits. Could it be that postgresql
refuses to make use of a 64 bit (BIGINT) index? Changing the type to
INTEGER quickly solved that problem. Maybe if I had one of those
new-fangled 64 bit Itanium processors.
Conclusion
There are many things that you can do with your SQL statements to also
improve query response, but these are adequately covered in the
interactive postgresql documentation.
Ericson Smith is a web developer at http://did-it.com.