picture of Luis
mod_gzip is an Apache module which compresses static html pages using Gzip, according to IETF standards for browsers that accept gzip enconding (IE, Netscape, etc). mod_gzip may accelerate the download time for pages 4/5 times and I strongly suggest you use mod_gzip in your webserver. However, due to a lack of a filtering mechanism between modules in Apache 1.x.x, there is no way to compress PHP generated output using mod_gzip. Therefore, we have to build our own compressing engine in PHP. In this article, I will explain how to use PHP output controlling functions to make your pages load FAST!

Introducing PHP output control functions

One of the best things in PHP4 is that you can tell PHP to buffer all of the output generated in the script, so no content is sent to the browser, until you decide to send it. You can use this function to use header and setcookie functions, wherever you want in your script. However, this is only a small advantage of the powerful output functions.

<?php

void ob_start
(void);

?>
This is used to tell the PHP processor to redirect all the output to an internal buffer. No output will be sent to the browser after a call to ob_start.

<?php

string ob_get_contents
(void);

?>
This returns the output buffer in a string that you can echo to send the accumulated output to the browser (after turning buffering off!).

<?php

int ob_get_length
(void);

?>
This returns the length of the output buffer.

<?php

void ob_end_clean
(void);

?>
Cleans the output buffer and also turns output buffering off. You have to use this function before outputing content to the browser.
void ob_implicit_flush ([int flag])
Is used to turn on/off implicit flushing (default=off). If this is on, then a "flush" is executed for every print/echo or output command and output is immediately sent to the browser.

Using Output Control to compress PHP output

You need the Zlib extension compiled in PHP4 to compress output. If needed, see the Zlib extension in the PHP documentation for install instructions.
First of all, initialize output buffering:

<?php

ob_start
();
ob_implicit_flush(0);

?>
Then, generate all the content using print, echo, or whatever you want. For example:

<?php

print("Hey this is a compressed output!");

?>
After the page is generated, we go back to the output using:

<?php

$contents 
ob_get_contents();
ob_end_clean();

?>
Then, we have to check if the browser supports compressed data. If so, the browser sends a ACCEPT_ENCODING HTTP header to the webserver in the request. We can check the variable $HTTP_ACCEPT_ENCODING and check for "gzip, deflate":

<?php

if(ereg('gzip, deflate',$HTTP_ACCEPT_ENCODING)) {
    
// Generation of Gzipped content
} else {
    echo 
$contents;
}

?>
That's simple, structured and clean enough to use. Let's see how we generate gzipped output:
(Taken from php.net)

<?php

// Tell the browser that they are going to get gzip data 
// Of course, you already checked if they support gzip or x-gzip 
// and if they support x-gzip, you'd change the header to say 
// x-gzip instead, right? 
header("Content-Encoding: gzip");

// Display the header of the gzip file 
// Thanks ck@medienkombinat.de! 
// Only display this once 
echo "\x1f\x8b\x08\x00\x00\x00\x00\x00";

// Figure out the size and CRC of the original for later 
$Size strlen($contents);
$Crc crc32($contents);

// Compress the data 
$contents gzcompress($contents9);

// We can't just output it here, since the CRC is messed up. 
// If I try to "echo $contents" at this point, the compressed 
// data is sent, but not completely. There are four bytes at 
// the end that are a CRC. Three are sent. The last one is 
// left in limbo. Also, if we "echo $contents", then the next 
// byte we echo will not be sent to the client. I am not sure 
// if this is a bug in 4.0.2 or not, but the best way to avoid 
// this is to put the correct CRC at the end of the compressed 
// data. (The one generated by gzcompress looks WAY wrong.) 
// This will stop Opera from crashing, gunzip will work, and 
// other browsers won't keep loading indefinately. 
// 
// Strip off the old CRC (it's there, but it won't be displayed 
// all the way -- very odd) 
$contents substr($contents0strlen($contents) - 4);

// Show only the compressed data 
echo $contents;

// Output the CRC, then the size of the original 
gzip_PrintFourChars($Crc);
gzip_PrintFourChars($Size);

// Done. You can append further data by gzcompressing 
// another string and reworking the CRC and Size stuff for 
// it too. Repeat until done. 

function gzip_PrintFourChars($Val) {
    for (
$i 0$i 4$i ++) {
        echo 
chr($Val 256);
        
$Val floor($Val 256);
    }
}

?>
If you want to test it as a working example, the whole script is:

<?php

// Start the output buffer 
ob_start();
ob_implicit_flush(0);

// Output stuff here... 
print("I'm compressed!\n");

$contents ob_get_contents();
ob_end_clean();

// Tell the browser that they are going to get gzip data 
// Of course, you already checked if they support gzip or x-gzip 
// and if they support x-gzip, you'd change the header to say 
// x-gzip instead, right? 
header("Content-Encoding: gzip");

// Display the header of the gzip file 
// Thanks ck@medienkombinat.de! 
// Only display this once 
echo "\x1f\x8b\x08\x00\x00\x00\x00\x00";

// Figure out the size and CRC of the original for later 
$Size strlen($contents);
$Crc crc32($contents);

// Compress the data 
$contents gzcompress($contents9);

// We can't just output it here, since the CRC is messed up. 
// If I try to "echo $contents" at this point, the compressed 
// data is sent, but not completely. There are four bytes at 
// the end that are a CRC. Three are sent. The last one is 
// left in limbo. Also, if we "echo $contents", then the next 
// byte we echo will not be sent to the client. I am not sure 
// if this is a bug in 4.0.2 or not, but the best way to avoid 
// this is to put the correct CRC at the end of the compressed 
// data. (The one generated by gzcompress looks WAY wrong.) 
// This will stop Opera from crashing, gunzip will work, and 
// other browsers won't keep loading indefinately. 
// 
// Strip off the old CRC (it's there, but it won't be displayed 
// all the way -- very odd) 
$contents substr($contents0strlen($contents) - 4);

// Show only the compressed data 
echo $contents;

// Output the CRC, then the size of the original 
gzip_PrintFourChars($Crc);
gzip_PrintFourChars($Size);

// Done. You can append further data by gzcompressing 
// another string and reworking the CRC and Size stuff for 
// it too. Repeat until done. 


function gzip_PrintFourChars($Val) {
    for (
$i 0$i 4$i ++) {
        echo 
chr($Val 256);
        
$Val floor($Val 256);
    }
}

?>

Caching PHP output

When PHP4 didn't exist and I had to use PHP3, I was very interested in the development of some sort of caching mechanism for the output of php scripts to reduce the load of the database, access to the filesystem, etc. There was no good way to do that in PHP3, but with output buffering, it is easy in php4.
This is a simple example:

<?php

//Construct a filename for the requested URI
$cached_file=md5($REQUEST_URI);

if((!
file_exists("/cache/$cached_file"))||(!is_valid("/cache/$cached_file"))) {
    
// is_valid validates the cache, you can check for expiration
    // or particular conditions in that function.
    // If there's no file or it's invalid we generate the output
    
ob_start();
    
ob_implicit_flush(0);

    
// Output stuff here... 

    
$contents ob_get_contents();
    
ob_end_clean();

    
$fil=fopen($cached_file,"w+");
    
fwrite($fil,$contents,$strlen($contents));
    
fclose($fil);


//Output the file here we are sure the file exists.
readfile($cached_file);

?>
This is a simple example. By using output buffering, you can build a very advanced, content-generation system, using caching mechanisms different for different blocks or applications, etc. It is up to you.

Conclusion

PHP output controlling functions are very useful to redirect the script output to a buffer and then manipulate it. Compressing the buffer for compliant browsers reduces the load time of the output 4/5 times. It can also be used as a caching mechanism to reduce access to data-sources (databases or files) and it may have significance, if we use XML. Think about this:
What if we build an engine in PHP4, using caching that takes data from data-sources (xml documents and databases) and dynamically builds contents in XML (without presentation). We can then take the XML output and use XSLT to convert it to any kind of presentation we want (html, wap, palm, pdf, etc). PHP4, with output control and the Sablotron XSLT extension, is perfect for this architecture. I wrote a follow up of "Smart architectures" describing this XML based architecture, which has a lot to do with the functions described in this article. Check it out if it is published. I will also write an article about the Sablotron XSLT extension as soon as it's documented and fully usable under PHP4. Send me your thoughts/feedback about everything you want.
We are in control using PHP!
-- Luis.