This article describes an alternative way of converting XML to HTML using the SAX parser. For each tag you want to convert, you write a conversion function. This function is called with two arguments: contents and attributes. The return value of the function will replace the tag and its contents in the finished document.

Introduction

We all know it: XML is great. (If you don't, look at some of the great articles at this site.) But why is it so complicated to use? You have to learn about DTDs, XSLTs, DOM, XPATH, XPOINTER... this is a lot of work, and most of these techniques are not really neccessary to build a website.
In this article you will learn how to build a simple converter for your XML.
The idea is this: You have a web page in clean XHTML. The biggest part of the html will be tables, menu images and other design stuff. So why not replace the 30 kilobyte code for the menu with a nice little "<menu />"?
When the document is opened from a webserver, a small php script replaces the <menu />-tag with the correct table HTML. So you have a much cleaner document, and the end result is the same. You can compose your whole page using these meta-tags, and even dynamic tags like <uppercase> or <showdate /> are easy to program.
These are the steps we will take:
Think up a set of tags that can be used to build an example website.
We will use the SAX parser to build the converter. To learn more about SAX and EXPAT (which is the name of the SAX implementation that PHP uses), please read Justin Grant's article "PHP and XML: using expat functions".

Thinking Up Some XML Tags

First you have to identify repeating elements of your web page. This can be menus, headlines, links, shopping cart products and so on. Then look at the parameters you want to assign to your elements. Look at this example XML, and you will get the idea:
file test.pxml:
<doc title="Pizza menu" bgcolor="lightblue">
	<bigheadline>
		Pizza Palace - Our Menu for  <dayofweek />
	</bigheadline>
	<br /><br />
	<b>Buon appetito!!!</b>
	<br /><br />
	
	<nicebox bordercolor="green">
		<product id="0" /><br />
		<product id="1" /><br />
		<product id="2" /><br />
		<product id="3" /><br />
		<product id="4" /><br />
	</nicebox>
</doc>

Dynamically Constructing XML

PHP doesn't care if it is embedded in HTML or XML. So if we use a little trick, we are able to use PHP to construct our XML.
This function creates an output buffer, opens and executes a file using the include-function, and returns the contents of the output buffer.

<?php 
function LoadAndExec($filename) {
    
ob_start();
    include(
$filename);
    
$content=ob_get_contents();
    
ob_end_clean();
    return 
$content;
}
?>
So now we can create the XML on the fly:
file test.pxml:
<doc title="Pizza menu" bgcolor="lightblue">
	<bigheadline>
		Pizza Palace - Our Menu for  <dayofweek />
	</bigheadline>
	<br /><br />
	<b>Buon appetito!!!</b>
	<br /><br />
	
	<nicebox bordercolor="green">


        <?php for($x=0;$x<5;$x++){ ?>
            <product id="<?php echo $x;?>" /><br />
        <?php ?>
</nicebox> </doc>

Creating the conversion functions

Each conversion function has two arguments:
  1. $contents: The character data between the opening and closing tag. This can include text, HTML tags and the output of conversion functions for tags nested between the currently processed. The function should include this in its return value.
  2. $attrs: An array containing the attributes of the tag. Note that the EXPAT parser automatically changes the attribute names to uppercase.
So the function to handle this tag: <font face="bold">blah</font> receives these attributes:

<?php
$contents
='blah';
$attrs=array('FACE'=>'bold'); 
?>
Here is an example conversion function:

<?php
function handle_bigheadline($contents,$attrs) {
    return(
'<b><font size="5" face="Verdana">'.$contents.'</font></b>');
}
The function wraps the contents it receives into some HTML to make it look like a headline. It then returns this modified content to the conversion framework, which uses it as part of the contents parameter for the function that converts the tag surrounding it.
Here is the function to convert the <dayofweek /> - tag:

<?php
function handle_dayofweek($contents,$attrs) {
    return(
date("l"));
}
?>
The function ignores the contents and attributes which are passed and just returns the name of the current weekday. So if you put some stuff between -Tags, it would never show up in the converted HTML.
Building the Conversion Framework
As mentioned before, we will use the SAX parser to build the conversion framework.
SAX is an event based parser. It works like this:
It chops up the document into elements. There are three element types: Start tags, end tags and character data. (Actually there are some more, but we won't need them in this example.)
Then for each element it encounters in the document, the parser calls a function assigned to the element type.
The element handling functions receive these parameters:
FunctionParameters
Opening TagName of the tag, array containing its attributes
Character data A string containing the characters
Closing TagName of the tag
We want to combine these three functions, so that we can use the attribute data to handle the contents data.
Our conversion functions will be called when the parser encounters a closing tag.
  1. An array containing the attributes of the tag (the SAX parser passes this to the opening tag function)
  2. The character data between the opening and closing tag (can contain the output of conversion functions for other tags nested between these tags)
We will have to think up a way to store these values when we receive them so that we have them at hand when the parser encounters the closing tag.

What Is A Stack?

A stack is a simple data structure. It has two operations: Put data onto stack("push") and take data from stack ("pop").
Imagine a stack of pizza boxes: You can put pizza Nr 1 on the stack, then pizza Nr 2, pizza Nr 3.
When you now take the pizzas from the top of the stack, you get them in reverse order: Pizza 3, Pizza 2, Pizza 1.
Here is the code for the stack:

<?php
$stack
=array();

function 
push($data) {
    global 
$stack;
    
array_push($stack,$data);
}

function 
pop() {
    global 
$stack;
    if(
count($stack)==0) {
        die(
"Error: Buffer Underflow!");
    }
    return 
array_pop($stack);
}
?>
In valid XML tags must not overlap, and for every opening tag there is a closing tag. The SAX parser walks through the script, and for every opening tag it reaches, our script will put its attributes onto the stack. Then, when it reaches a closing tag, it takes one level from the stack. So when the parser is converting a document, and it has already processed 13 opening tags and 8 closing tags, there will be 5 elements on the stack.
As in XML the number of opening tags has to equal the number of closing tags, the stack will be empty when the parser reaches the end of the document. And as there are no overlapping tags, the data sets are always fetched in the correct order.
Here is a list of the steps our script will take to walk through a short piece of XML (the XML file contains no character data, so only the opening and closing functions are called by SAX).
<doc>
	<tag1 parameter="Param 1">
		<tag2 parameter="Param 2">
			<tag3 parameter="Param 3">
			</tag3>
		</tag2>
	</tag1>
</doc>
Parsed element       
<tag1 parameter="Param 1">
<tag2 parameter="Param 2">        
<tag3 parameter="Param 3">        
</tag3>   handle_tag3(pop());
</tag2>   handle_tag2(pop());
</tag1>   handle_tag1(pop());
</doc>    handle_doc(pop());
      
Action
push(array( 'PARAMETER'=>'Param 1'));
push(array('PARAMETER'=>'Param 2'));
push(array('PARAMETER'=>'Param 3'));
//receives array(''PARAMETER'=>'Param 3')
//receives array('PARAMETER'=>'Param 2')
//receives array( 'PARAMETER'=>'Param 1')
//receives array()
      
The element handling functions
These functions are called by the SAX parser.
These are the handling functions for opening tags and character data:

<?php
function tag_open($parser$name$attrs)
{
    
push(array("attrs"=>$attrs"contents"=>""));
}

function 
cdata($parser,$string) {
    
// Fetch from Stack, insert the character data, and put it back
    
$data=pop();
    
$data["contents"].=$string;
    
push($data);
}
?>
In addition to the attribute list we also store the character data inside the tag into the stack.
The tag closing function is the most complicated part of the script. It works like this:
  1. Pop the parameters and content from the stack
  2. If a conversion function exists for this tag, call it and pass it the attributes and contents.Save its return value. If no conversion function is assigned to the current tag, rebuild the original tag
  3. Pop the next set of data from the stack (the data of the tag surrounding the current one)
  4. Add the saved return value to its contents data
  5. Put it back onto the stack

<?php
function tag_close($parser,$name) {
    
$function="handle_".strtolower($name);
    
$data=pop();
    
    if(
function_exists($function)) {
        
$buffer=call_user_func($function,$data["contents"],$data["attrs"]);
    }
    else {
        
$buffer=create_xml_tag($name$data["attrs"], $data["contents"]);
    }
    
$sublevel=pop();
    
$sublevel["contents"].=$buffer;
    
push($sublevel);

?>
Character data is passed through, as are tags that are not handled by a conversion function. So you don't have to write a handling function for every tag in your document, because they stay unchanged. You can take a valid XHTML document as input for the converter, and the output will be the same document except for the tags replaced by your conversion functions.
Here's the whole script:
file test.php:

<?php

//--------------- The Stack --------------------//

$stack=array();

function 
push($data) {
    
// This function puts the data in the argument onto the stack
    
global $stack;
    
array_push($stack,$data);
}

function 
pop() {
    
// This function takes the uppermost data
    // from the stack and returns it
    
    
global $stack;
    
// Error checking
    
if(count($stack)==0) {
        die(
"Error: Buffer Underflow!");
    }
    return 
array_pop($stack);
}

//--------------- XML converter-----------------//

function create_xml_tag($name,$attrs$contents) {
    
//returns a valid XML tag
    
$buffer="<$name";
    
$attributestring="";
    foreach(
$attrs as $attr=>$value) {
        
$buffer.=' '.$attr.'="'.$value.'"';
    }
    
    
// Is this tag empty? 
    
if(strlen($contents)==0) {
        
$buffer.=' />';
    }
    else {
        
$buffer.='>'.$contents.'</'.$name.'>';
    }
    return 
$buffer;
}

function 
LoadAndExec($filename) {
    
// This function opens a file, executes its PHP functions,
    // and returns the output. 
    
    // start an output buffer
    
ob_start();
    
    
// include the file and execute its PHP code
    
include($filename);
    
    
// Stop buffer and return its contents
    
$content=ob_get_contents();
    
ob_end_clean();
    return 
$content;
}

function 
doParse($xml
{
    
// This function initializes the stack,
    // starts the SAX parser 
    // and returns the bottom stack element
    
    // Put an empty element onto the stack - this will contain the 
    // output of the parsing process.
    
push(array("contents"=>""));
    
    
// Initialize SAX parser
    
$xmlparser=xml_parser_create("ISO-8859-1");
    
    
// Assign element handling functions
    
xml_set_element_handler($xmlparser,"tag_open","tag_close");
    
xml_set_character_data_handler($xmlparser,"cdata");
    
xml_set_default_handler($xmlparser,"cdata");
    
    
// Start the parsing process
    
if (!xml_parse($xmlparser,$xml)) {
        die(
"Error parsing XML!");
    }
    
    
// Destroy the parser
    
xml_parser_free($xmlparser);
    
    
// return contents of the bottom stack element
    
$first=pop();
    return 
$first["contents"];
}

function 
tag_open($parser$name$attrs
{
    
//Push the attribute list onto the stack
    
push(array("attrs"=>$attrs"contents"=>""));
}

function 
cdata($parser,$string) {
    
// Fetch from Stack, insert the character data, and put it back
    
$data=pop();
    
$data["contents"].=$string;
    
push($data);
}

function 
tag_close($parser,$name) {
    
// This function first looks for a handling function for the 
    // current tag. If there is none, the tag gets passed through.
    // If a handling function exists, execute it and add its return data
    // to the contents of the stack element under it

    // The name of the tag handling function 
    
$function="handle_".strtolower($name);
    
    
// Fetch the content and attributes of the current tag from the stack
    
$data=pop();
    
    if(
function_exists($function)) {
        
// The tag handling function exists. Execute it!
        
$buffer=call_user_func($function,$data["contents"],$data["attrs"]);
    }
    else {
        
// No handling function for this tag. Pass it through.
        
$buffer=create_xml_tag($name$data["attrs"], $data["contents"]);
        
// Create a string with the attributes in XML format
    
}
    
    
// Take the converted tag and add it to the contents of the tag around it   
    
$sublevel=pop();
    
$sublevel["contents"].=$buffer;
    
push($sublevel);
}


//--------------- Tag handling functions ------------------//

function handle_doc($contents,$attrs) {
    
$buffer='';
    
$buffer.='<html>';
    
$buffer.='<head><title>'.$attrs["TITLE"].'</title></head>';
    
$buffer.='<table border="1" bgcolor="'.$attrs["BGCOLOR"].'" width="90%" align="center" cellpadding="15">';
    
$buffer.='<tr><td>';
    
$buffer.=$contents;
    
$buffer.='</td></tr></table></html>';
    return 
$buffer;
}

function 
handle_dayofweek($contents,$attrs) {
    return(
date("l"));
}

function 
handle_bigheadline($contents,$attrs) {
    return(
'<b><font size="5" face="Verdana">'.$contents.'</font></b>');
}

function 
handle_nicebox($contents,$attrs) {
    
$buffer="";
    
$buffer.='<table border="0" cellspacing="0" cellpadding="1"';
    
$buffer.=' bgcolor="'.$attrs["BORDERCOLOR"].'"><tr><td>';
    
$buffer.='<table border="0" cellspacing="5" cellpadding="0"';
    
$buffer.=' bgcolor="white">';
    
    
$buffer.='<tr><td>';
    
$buffer.=$contents;
    
$buffer.='</td></tr></table></td></tr></table>';
    
    
//uncomment this to get a dump of the stack in action
    //echo "<pre>";var_dump($GLOBALS["stack"]);echo "</pre>";
    
    
return $buffer;
}

function 
handle_product($contents,$attrs) {
    
$names=array(    0=>"Quattro Stagioni",
                    
1=>"Diavola",
                    
2=>"Margherita",
                    
3=>"Tonno",
                    
4=>"Capricciosa"
                
);
    
    
$buffer="";
    
$buffer.='<a href="products.php?id='.$attrs["ID"].'">';
    
$buffer.=$names[$attrs["ID"]].' - click here to buy!';
    
$buffer.='</a>';
    return 
$buffer;
}

//--------------- Main program------------------//

// Read in the XML file and execute its PHP code
$xml=LoadAndExec("test.pxml");

//Convert the XML to HTML using the tag handling functions
$html=doParse($xml);

// Output the converted XML to the browser
echo $html;
?>
Please save the files test.php and test.pxml to an accessible directory on your webserver and open it in a browser. Note that the PHP version on the server has to include the expat parser (most do).
Closing Note
The conversion process is pretty fast, so you can do it on-the-fly. If you are concerned about web server loads, you can put the converted output into a cache-file.
I used this technique myself for quite a few sites. I was able to build some tag libraries for form processing, shopping carts and database tables. With some add-ons, this stack processing is quite powerfull, but the code for the processing functions can become rather complicated when you add intelligence to them and give them access to other levels of the stack (the parent tags).
Some other ideas for tags:
<image databaseid="3901" />
<openinpopup size="big"> Doh </openinpopup>
<showflash src="bla.swf" /> 
© 2003 Martin Scheffler