This article describes an alternative way of converting XML to HTML using the SAX parser.
For each tag you want to convert, you write a conversion function. This function is called
with two arguments: contents and attributes. The return value of the function will replace
the tag and its contents in the finished document.
Introduction
We all know it: XML is great. (If you don't, look at some of the great articles at this site.)
But why is it so complicated to use? You have to learn about DTDs, XSLTs, DOM, XPATH, XPOINTER...
this is a lot of work, and most of these techniques are not really neccessary to build a website.
In this article you will learn how to build a simple converter for your XML.
The idea is this: You have a web page in clean XHTML. The biggest part of the html will be
tables, menu images and other design stuff. So why not replace the 30 kilobyte code for the
menu with a nice little "<menu />"?
When the document is opened from a webserver, a small php script replaces the <menu />-tag with
the correct table HTML. So you have a much cleaner document, and the end result is the same. You
can compose your whole page using these meta-tags, and even dynamic tags like <uppercase> or
<showdate /> are easy to program.
These are the steps we will take:
Think up a set of tags that can be used to build an example website.
- Write functions that convert these tags into HTML.
- Write a script that takes input from the SAX XML parser and calls the conversion functions with the required arguments
- Send the output of these functions to the browser
We will use the SAX parser to build the converter. To learn more about SAX and EXPAT
(which is the name of the SAX implementation that PHP uses), please read Justin Grant's article
"
PHP and XML: using
expat functions".
Thinking Up Some XML Tags
First you have to identify repeating elements of your web page. This can be menus, headlines,
links, shopping cart products and so on. Then look at the parameters you want to assign to your
elements. Look at this example XML, and you will get the idea:
file test.pxml:
<doc title="Pizza menu" bgcolor="lightblue">
<bigheadline>
Pizza Palace - Our Menu for <dayofweek />
</bigheadline>
<br /><br />
<b>Buon appetito!!!</b>
<br /><br />
<nicebox bordercolor="green">
<product id="0" /><br />
<product id="1" /><br />
<product id="2" /><br />
<product id="3" /><br />
<product id="4" /><br />
</nicebox>
</doc>
Dynamically Constructing XML
PHP doesn't care if it is embedded in HTML or XML.
So if we use a little trick, we are able to use PHP to construct our XML.
This function creates an output buffer, opens and executes a file using the include-function,
and returns the contents of the output buffer.
<?php
function LoadAndExec($filename) {
ob_start();
include($filename);
$content=ob_get_contents();
ob_end_clean();
return $content;
}
?>
So now we can create the XML on the fly:
file test.pxml:
<doc title="Pizza menu" bgcolor="lightblue">
<bigheadline>
Pizza Palace - Our Menu for <dayofweek />
</bigheadline>
<br /><br />
<b>Buon appetito!!!</b>
<br /><br />
<nicebox bordercolor="green">
<?php for($x=0;$x<5;$x++){ ?>
<product id="<?php echo $x;?>" /><br />
<?php } ?>
</nicebox>
</doc>
Creating the conversion functions
Each conversion function has two arguments:
$contents: The character data between the opening and closing tag.
This can include text, HTML tags and the output of conversion functions for tags nested between the
currently processed. The function should include this in its return value.
$attrs: An array containing the attributes of the tag. Note that the EXPAT parser
automatically changes the attribute names to uppercase.
So the function to handle this tag: <font face="bold">blah</font>
receives these attributes:
<?php
$contents='blah';
$attrs=array('FACE'=>'bold');
?>
Here is an example conversion function:
<?php
function handle_bigheadline($contents,$attrs) {
return('<b><font size="5" face="Verdana">'.$contents.'</font></b>');
}
The function wraps the contents it receives into some HTML to make it look like a headline.
It then returns this modified content to the conversion framework, which uses it as part of the
contents parameter for the function that converts the tag surrounding it.
Here is the function to convert the <dayofweek /> - tag:
<?php
function handle_dayofweek($contents,$attrs) {
return(date("l"));
}
?>
The function ignores the contents and attributes which are passed and just returns the name of the current
weekday. So if you put some stuff between -Tags, it would never show up in the converted HTML.
Building the Conversion Framework
As mentioned before, we will use the SAX parser to build the conversion framework.
SAX is an event based parser. It works like this:
It chops up the document into elements. There are three element types: Start tags, end tags and character data. (Actually there are some more, but we won't need them in this example.)
Then for each element it encounters in the document, the parser calls a function assigned to the element type.
The element handling functions receive these parameters:
|
Function | Parameters |
| Opening Tag | Name of the tag, array containing its attributes |
| Character data | A string containing the characters |
| Closing Tag | Name of the tag |
We want to combine these three functions, so that we can use the attribute data to handle the contents data.
Our conversion functions will be called when the parser encounters a closing tag.
- An array containing the attributes of the tag (the SAX parser passes this to the opening tag function)
-
The character data between the opening and closing tag (can contain the output of conversion functions
for other tags nested between these tags)
We will have to think up a way to store these values when we receive them so that we have them at hand when the
parser encounters the closing tag.
What Is A Stack?
A stack is a simple data structure.
It has two operations: Put data onto stack("push") and take data from stack ("pop").
Imagine a stack of pizza boxes: You can put pizza Nr 1 on the stack, then pizza Nr 2, pizza Nr 3.
When you now take the pizzas from the top of the stack, you get them in reverse order: Pizza 3, Pizza 2, Pizza 1.
Here is the code for the stack:
<?php
$stack=array();
function push($data) {
global $stack;
array_push($stack,$data);
}
function pop() {
global $stack;
if(count($stack)==0) {
die("Error: Buffer Underflow!");
}
return array_pop($stack);
}
?>
In valid XML tags must not overlap, and for every opening tag there is a closing tag. The SAX parser walks through
the script, and for every opening tag it reaches, our script will put its attributes onto the stack. Then, when it
reaches a closing tag, it takes one level from the stack.
So when the parser is converting a document, and it has already processed 13 opening tags and 8 closing tags,
there will be 5 elements on the stack.
As in XML the number of opening tags has to equal the number of closing tags, the stack will be empty when
the parser reaches the end of the document. And as there are no overlapping tags, the data sets are always
fetched in the correct order.
Here is a list of the steps our script will take to walk through a short piece of XML (the XML file contains
no character data, so only the opening and closing functions are called by SAX).
<doc>
<tag1 parameter="Param 1">
<tag2 parameter="Param 2">
<tag3 parameter="Param 3">
</tag3>
</tag2>
</tag1>
</doc>
Parsed element
<tag1 parameter="Param 1">
<tag2 parameter="Param 2">
<tag3 parameter="Param 3">
</tag3> handle_tag3(pop());
</tag2> handle_tag2(pop());
</tag1> handle_tag1(pop());
</doc> handle_doc(pop());
|
Action
push(array( 'PARAMETER'=>'Param 1'));
push(array('PARAMETER'=>'Param 2'));
push(array('PARAMETER'=>'Param 3'));
//receives array(''PARAMETER'=>'Param 3')
//receives array('PARAMETER'=>'Param 2')
//receives array( 'PARAMETER'=>'Param 1')
//receives array()
|
The element handling functions
These functions are called by the SAX parser.
These are the handling functions for opening tags and character data:
<?php
function tag_open($parser, $name, $attrs)
{
push(array("attrs"=>$attrs, "contents"=>""));
}
function cdata($parser,$string) {
// Fetch from Stack, insert the character data, and put it back
$data=pop();
$data["contents"].=$string;
push($data);
}
?>
In addition to the attribute list we also store the character data inside the tag into the stack.
The tag closing function is the most complicated part of the script. It works like this:
- Pop the parameters and content from the stack
-
If a conversion function exists for this tag, call it and pass it the attributes and contents.Save its return value. If no conversion function is assigned to the current tag, rebuild the original tag
-
Pop the next set of data from the stack (the data of the tag surrounding the current one)
-
Add the saved return value to its contents data
-
Put it back onto the stack
<?php
function tag_close($parser,$name) {
$function="handle_".strtolower($name);
$data=pop();
if(function_exists($function)) {
$buffer=call_user_func($function,$data["contents"],$data["attrs"]);
}
else {
$buffer=create_xml_tag($name, $data["attrs"], $data["contents"]);
}
$sublevel=pop();
$sublevel["contents"].=$buffer;
push($sublevel);
}
?>
Character data is passed through, as are tags that are not handled by a conversion function.
So you don't have to write a handling function for every tag in your document, because they
stay unchanged. You can take a valid XHTML document as input for the converter, and the output
will be the same document except for the tags replaced by your conversion functions.
Here's the whole script:
file test.php:
<?php
//--------------- The Stack --------------------//
$stack=array();
function push($data) {
// This function puts the data in the argument onto the stack
global $stack;
array_push($stack,$data);
}
function pop() {
// This function takes the uppermost data
// from the stack and returns it
global $stack;
// Error checking
if(count($stack)==0) {
die("Error: Buffer Underflow!");
}
return array_pop($stack);
}
//--------------- XML converter-----------------//
function create_xml_tag($name,$attrs, $contents) {
//returns a valid XML tag
$buffer="<$name";
$attributestring="";
foreach($attrs as $attr=>$value) {
$buffer.=' '.$attr.'="'.$value.'"';
}
// Is this tag empty?
if(strlen($contents)==0) {
$buffer.=' />';
}
else {
$buffer.='>'.$contents.'</'.$name.'>';
}
return $buffer;
}
function LoadAndExec($filename) {
// This function opens a file, executes its PHP functions,
// and returns the output.
// start an output buffer
ob_start();
// include the file and execute its PHP code
include($filename);
// Stop buffer and return its contents
$content=ob_get_contents();
ob_end_clean();
return $content;
}
function doParse($xml)
{
// This function initializes the stack,
// starts the SAX parser
// and returns the bottom stack element
// Put an empty element onto the stack - this will contain the
// output of the parsing process.
push(array("contents"=>""));
// Initialize SAX parser
$xmlparser=xml_parser_create("ISO-8859-1");
// Assign element handling functions
xml_set_element_handler($xmlparser,"tag_open","tag_close");
xml_set_character_data_handler($xmlparser,"cdata");
xml_set_default_handler($xmlparser,"cdata");
// Start the parsing process
if (!xml_parse($xmlparser,$xml)) {
die("Error parsing XML!");
}
// Destroy the parser
xml_parser_free($xmlparser);
// return contents of the bottom stack element
$first=pop();
return $first["contents"];
}
function tag_open($parser, $name, $attrs)
{
//Push the attribute list onto the stack
push(array("attrs"=>$attrs, "contents"=>""));
}
function cdata($parser,$string) {
// Fetch from Stack, insert the character data, and put it back
$data=pop();
$data["contents"].=$string;
push($data);
}
function tag_close($parser,$name) {
// This function first looks for a handling function for the
// current tag. If there is none, the tag gets passed through.
// If a handling function exists, execute it and add its return data
// to the contents of the stack element under it
// The name of the tag handling function
$function="handle_".strtolower($name);
// Fetch the content and attributes of the current tag from the stack
$data=pop();
if(function_exists($function)) {
// The tag handling function exists. Execute it!
$buffer=call_user_func($function,$data["contents"],$data["attrs"]);
}
else {
// No handling function for this tag. Pass it through.
$buffer=create_xml_tag($name, $data["attrs"], $data["contents"]);
// Create a string with the attributes in XML format
}
// Take the converted tag and add it to the contents of the tag around it
$sublevel=pop();
$sublevel["contents"].=$buffer;
push($sublevel);
}
//--------------- Tag handling functions ------------------//
function handle_doc($contents,$attrs) {
$buffer='';
$buffer.='<html>';
$buffer.='<head><title>'.$attrs["TITLE"].'</title></head>';
$buffer.='<table border="1" bgcolor="'.$attrs["BGCOLOR"].'" width="90%" align="center" cellpadding="15">';
$buffer.='<tr><td>';
$buffer.=$contents;
$buffer.='</td></tr></table></html>';
return $buffer;
}
function handle_dayofweek($contents,$attrs) {
return(date("l"));
}
function handle_bigheadline($contents,$attrs) {
return('<b><font size="5" face="Verdana">'.$contents.'</font></b>');
}
function handle_nicebox($contents,$attrs) {
$buffer="";
$buffer.='<table border="0" cellspacing="0" cellpadding="1"';
$buffer.=' bgcolor="'.$attrs["BORDERCOLOR"].'"><tr><td>';
$buffer.='<table border="0" cellspacing="5" cellpadding="0"';
$buffer.=' bgcolor="white">';
$buffer.='<tr><td>';
$buffer.=$contents;
$buffer.='</td></tr></table></td></tr></table>';
//uncomment this to get a dump of the stack in action
//echo "<pre>";var_dump($GLOBALS["stack"]);echo "</pre>";
return $buffer;
}
function handle_product($contents,$attrs) {
$names=array( 0=>"Quattro Stagioni",
1=>"Diavola",
2=>"Margherita",
3=>"Tonno",
4=>"Capricciosa"
);
$buffer="";
$buffer.='<a href="products.php?id='.$attrs["ID"].'">';
$buffer.=$names[$attrs["ID"]].' - click here to buy!';
$buffer.='</a>';
return $buffer;
}
//--------------- Main program------------------//
// Read in the XML file and execute its PHP code
$xml=LoadAndExec("test.pxml");
//Convert the XML to HTML using the tag handling functions
$html=doParse($xml);
// Output the converted XML to the browser
echo $html;
?>
Please save the files test.php and test.pxml to an accessible directory on your webserver and open
it in a browser. Note that the PHP version on the server has to include the expat parser (most do).
Closing Note
The conversion process is pretty fast, so you can do it on-the-fly.
If you are concerned about web server loads, you can put the converted output into a cache-file.
I used this technique myself for quite a few sites. I was able to build some tag libraries for form
processing, shopping carts and database tables. With some add-ons, this stack processing is quite
powerfull, but the code for the processing functions can become rather complicated when you add
intelligence to them and give them access to other levels of the stack (the parent tags).
Some other ideas for tags:
<image databaseid="3901" />
<openinpopup size="big"> Doh </openinpopup>
<showflash src="bla.swf" />
© 2003 Martin Scheffler