picture of Tim Perdue
Last week I talked about how to best use indices in your database queries. Now let's talk about transactions. Transactions may sound reasonably straightforward, as indexing does, but there are a lot of things you can do wrong which will haunt you in certain circumstances.
The general idea behind a transaction is that several steps can be performed in series, with the capability to undo all of the steps at once if needed. In addition, the transaction should happen inside an isolated world where other transactions cannot change data while your transaction is running.

Transactions

If you're one of the 6 or 7 people who didn't buy Linux Journal just to read my article ;-), then you may not know how to properly use transactions or why you should use them.
When you are used to building sites and not using transactions, you work around their absence in creative ways. For instance, I have a lot of random cron jobs that run on SourceForge to munge data in interesting ways. Some of that munging can take quite a while, and you don't want your site to be out of business while it happens. So with MySQL, I would create temp tables to munge data in, then swap the temp table for the permanent one when I'm done. This did the trick - mostly. But for the brief instance while I was swapping tables, the site could be hosed.
If you live in a world where you have transactions, you can wrap the entire process in a begin...commit block and your site won't be out of business for even one microsecond.
So rather than munging data in a temp table, I start a transaction, delete all the data from the live table, recreate it, then commit my changes. The entire process happens in its own little universe right up until I commit them, and then the rest of the world can see the changes.
Like proper indexing, proper transaction use is one or two degrees more complex than you may have bargained for. To do things 100% correctly, you may have to wrap entire blocks of selects and updates inside the same transaction.
And if you're updating a row of data, you should first get a lock on that row so it doesn't change while you're in the middle of a transaction.
A very simple example follows:
BEGIN;

SELECT * FROM users WHERE user_id=1 FOR UPDATE;

--
--do some time-consuming work in here
--
--set up $calculation variable based on work
--

UPDATE users SET field1='$calcultation' WHERE user_id=1;

--send email update to user

COMMIT;
At first glance, you might be tempted to simply wrap the calculations and update inside a transaction, but to be 100% kosher, you should also include the email sending and any other page display code within your transaction.
Why? Well, what if another transaction ran after your update, but before you sent out the email? The email you send could be tainted by the other transaction and the recipient would receive 2 emails reflecting the second transaction.
Sound petty? Well, it might very well be, but we're talking about best practices here and you might as well know the best possible way to do it. Is the world going to end in the extremely rare occasion where the inaccurate email was sent? Probably not, unless you're talking about a financial application and/or a very high-traffic web site.
Here's a piece of code from the Linux Journal article that shows how to handle a transaction in the event it fails:

<?php

function cart_new() {
    
//make the database connection handle available
    
global $conn,$customer_id,$feedback;

    
//start a transaction
    
query("BEGIN WORK");

    
//query postgres for the next value in our sequence
    
$res=query("SELECT nextval('seq_customer_id')");

    
//check for errors
    
if (!$res || pg_numrows($res)<1) {
        
$feedback .= pg_errormessage($conn);
        
$feedback .= ' Error - Database didn\'t return next value ';
        
query("ROLLBACK");
        return 
false;
    } else {
        
//set that value in a local var
        
$customer_id=pg_result($res,0,0);

        
//register the id with PHP4
        
session_register('customer_id');

        
//insert the new customer row
        
$res=query("INSERT INTO customers (customer_id) VALUES ('$customer_id')");

        
//check for errors
        
if (!$res || pg_cmdtuples($res)<1) {
            
$feedback .= pg_errormessage($conn);
            
$feedback .= ' Error - couldn\'t insert new customer row ';
            
query("ROLLBACK");
            return 
false;
        } else {
            
//commit this transaction
            
query("COMMIT");
            return 
true;
        }
    }
}

?>
Here's some basic pseudocode for what's happening there:
1. BEGIN transaction;

2. Do first insert

	--check for errors and rollback if necessary

3. Do second insert

	--check for errors and rollback if necessary

4. Display results after update

5. COMMIT;
I haven't figured out the perfect solution yet to a problem that I've been running into lately. How do you handle the situation where you are inside of a transaction and you want to call a function that has a transaction coded in of its own? Postgres chokes when trying to do a transaction within a transaction, and it's a vexing problem really. Do you just avoid coding begin/commit blocks inside of function calls, or do you pass extra params to the function to skip the transaction code? Or am I missing something obvious?
Also, what do you do when you are mixing database code with file i/o or other code that cannot be rolled back? On SourceForge, we are using LDAP modules and LDAP does not have a rollback feature. So if something screws up, you may be able to rollback changes to the database, but not your LDAP directory, causing inconsistencies. Thoughts and opinions are welcome.
Now you should have the fundamental building blocks for working with transactions. Let me know if you have questions or comments - I'd be interested in knowing how you make use of transactional databases in your PHP applications.
--Tim