mod_gzip is an Apache module which compresses static html pages
using Gzip, according to IETF standards for browsers that accept
gzip enconding (IE, Netscape, etc). mod_gzip may
accelerate the download time for pages 4/5 times
and I strongly suggest you use mod_gzip in your webserver.
However, due to a lack of a filtering mechanism between modules
in Apache 1.x.x, there is no way to compress PHP generated output
using mod_gzip. Therefore, we have to build our own compressing
engine in PHP. In this article, I will explain how to use PHP output
controlling functions to make your pages load FAST!
Introducing PHP output control functions
One of the best things in PHP4 is that you can tell PHP to buffer
all of the output generated in the script, so no content is sent
to the browser, until you decide to send it. You can use this function to
use header and setcookie functions, wherever you want in
your script. However, this is only a small advantage of the powerful
output functions.
<?php
void ob_start(void);
?>
This is used to tell the PHP processor to redirect all the output to
an internal buffer. No output will be sent to the browser after
a call to ob_start.
<?php
string ob_get_contents(void);
?>
This returns the output buffer in a string that you can echo to send the
accumulated output to the browser (after turning buffering off!).
<?php
int ob_get_length(void);
?>
This returns the length of the output buffer.
<?php
void ob_end_clean(void);
?>
Cleans the output buffer and also turns output buffering off. You
have to use this function before outputing content to the browser.
void ob_implicit_flush ([int flag])
Is used to turn on/off implicit flushing (default=off). If this is on, then
a "flush" is executed for every print/echo or output command and output
is immediately sent to the browser.
Using Output Control to compress PHP output
You need the Zlib extension compiled in PHP4 to compress output. If needed,
see the Zlib extension in the PHP documentation for install instructions.
First of all, initialize output buffering:
<?php
ob_start();
ob_implicit_flush(0);
?>
Then, generate all the content using print, echo, or whatever
you want. For example:
<?php
print("Hey this is a compressed output!");
?>
After the page is generated, we go back to the output using:
<?php
$contents = ob_get_contents();
ob_end_clean();
?>
Then, we have to check if the browser supports compressed
data. If so, the browser sends a ACCEPT_ENCODING HTTP header to the
webserver in the request. We can check the variable $HTTP_ACCEPT_ENCODING
and check for "gzip, deflate":
<?php
if(ereg('gzip, deflate',$HTTP_ACCEPT_ENCODING)) {
// Generation of Gzipped content
} else {
echo $contents;
}
?>
That's simple, structured and clean enough to use. Let's see
how we generate gzipped output:
(Taken from php.net)
<?php
// Tell the browser that they are going to get gzip data
// Of course, you already checked if they support gzip or x-gzip
// and if they support x-gzip, you'd change the header to say
// x-gzip instead, right?
header("Content-Encoding: gzip");
// Display the header of the gzip file
// Thanks ck@medienkombinat.de!
// Only display this once
echo "\x1f\x8b\x08\x00\x00\x00\x00\x00";
// Figure out the size and CRC of the original for later
$Size = strlen($contents);
$Crc = crc32($contents);
// Compress the data
$contents = gzcompress($contents, 9);
// We can't just output it here, since the CRC is messed up.
// If I try to "echo $contents" at this point, the compressed
// data is sent, but not completely. There are four bytes at
// the end that are a CRC. Three are sent. The last one is
// left in limbo. Also, if we "echo $contents", then the next
// byte we echo will not be sent to the client. I am not sure
// if this is a bug in 4.0.2 or not, but the best way to avoid
// this is to put the correct CRC at the end of the compressed
// data. (The one generated by gzcompress looks WAY wrong.)
// This will stop Opera from crashing, gunzip will work, and
// other browsers won't keep loading indefinately.
//
// Strip off the old CRC (it's there, but it won't be displayed
// all the way -- very odd)
$contents = substr($contents, 0, strlen($contents) - 4);
// Show only the compressed data
echo $contents;
// Output the CRC, then the size of the original
gzip_PrintFourChars($Crc);
gzip_PrintFourChars($Size);
// Done. You can append further data by gzcompressing
// another string and reworking the CRC and Size stuff for
// it too. Repeat until done.
function gzip_PrintFourChars($Val) {
for ($i = 0; $i < 4; $i ++) {
echo chr($Val % 256);
$Val = floor($Val / 256);
}
}
?>
If you want to test it as a working example, the whole script is:
<?php
// Start the output buffer
ob_start();
ob_implicit_flush(0);
// Output stuff here...
print("I'm compressed!\n");
$contents = ob_get_contents();
ob_end_clean();
// Tell the browser that they are going to get gzip data
// Of course, you already checked if they support gzip or x-gzip
// and if they support x-gzip, you'd change the header to say
// x-gzip instead, right?
header("Content-Encoding: gzip");
// Display the header of the gzip file
// Thanks ck@medienkombinat.de!
// Only display this once
echo "\x1f\x8b\x08\x00\x00\x00\x00\x00";
// Figure out the size and CRC of the original for later
$Size = strlen($contents);
$Crc = crc32($contents);
// Compress the data
$contents = gzcompress($contents, 9);
// We can't just output it here, since the CRC is messed up.
// If I try to "echo $contents" at this point, the compressed
// data is sent, but not completely. There are four bytes at
// the end that are a CRC. Three are sent. The last one is
// left in limbo. Also, if we "echo $contents", then the next
// byte we echo will not be sent to the client. I am not sure
// if this is a bug in 4.0.2 or not, but the best way to avoid
// this is to put the correct CRC at the end of the compressed
// data. (The one generated by gzcompress looks WAY wrong.)
// This will stop Opera from crashing, gunzip will work, and
// other browsers won't keep loading indefinately.
//
// Strip off the old CRC (it's there, but it won't be displayed
// all the way -- very odd)
$contents = substr($contents, 0, strlen($contents) - 4);
// Show only the compressed data
echo $contents;
// Output the CRC, then the size of the original
gzip_PrintFourChars($Crc);
gzip_PrintFourChars($Size);
// Done. You can append further data by gzcompressing
// another string and reworking the CRC and Size stuff for
// it too. Repeat until done.
function gzip_PrintFourChars($Val) {
for ($i = 0; $i < 4; $i ++) {
echo chr($Val % 256);
$Val = floor($Val / 256);
}
}
?>
Caching PHP output
When PHP4 didn't exist and I had to use PHP3, I was
very interested in the development of some sort of caching mechanism
for the output of php scripts to reduce the load of the database,
access to the filesystem, etc. There was no good way to do that in PHP3,
but with output buffering, it is easy in php4.
This is a simple example:
<?php
//Construct a filename for the requested URI
$cached_file=md5($REQUEST_URI);
if((!file_exists("/cache/$cached_file"))||(!is_valid("/cache/$cached_file"))) {
// is_valid validates the cache, you can check for expiration
// or particular conditions in that function.
// If there's no file or it's invalid we generate the output
ob_start();
ob_implicit_flush(0);
// Output stuff here...
$contents = ob_get_contents();
ob_end_clean();
$fil=fopen($cached_file,"w+");
fwrite($fil,$contents,$strlen($contents));
fclose($fil);
}
//Output the file here we are sure the file exists.
readfile($cached_file);
?>
This is a simple example. By using output buffering, you can build
a very advanced, content-generation system, using caching mechanisms different
for different blocks or applications, etc. It is up to you.
Conclusion
PHP output controlling functions are very useful to redirect the script output
to a buffer and then manipulate it. Compressing the buffer for compliant browsers
reduces the load time of the output 4/5 times. It can
also be used as a caching mechanism to reduce access to data-sources (databases
or files) and it may have significance, if we use XML. Think about this:
What if we build an engine in PHP4, using caching that takes data from data-sources
(xml documents and databases) and dynamically builds contents in XML (without
presentation). We can then take the XML output and use XSLT to convert it to any
kind of presentation we want (html, wap, palm, pdf, etc). PHP4, with output control
and the Sablotron XSLT extension, is perfect for this architecture. I wrote a follow up
of "Smart architectures" describing this XML based architecture, which has a lot
to do with the functions described in this article. Check it out if it is published.
I will also write an article about the Sablotron XSLT extension as soon as it's
documented and fully usable under PHP4. Send me your thoughts/feedback about everything
you want.
We are in control using PHP!
-- Luis.