picture of Jesus
I've had text lying around from when we were writing the book (1) last year, and the people at Wrox gave they OK for the information to be put as articles in PHPBuilder.com, so here it is the first one of a series.


The Extensible Markup Language (XML), is a metalanguage that allows the definition of markup languages. It is not a markup language in itself, but you can think of it as providing the construction rules for specialized languages used for data description. XML is not the new generation HTML, it is more oriented to be used in data applications in which the data and its rendition are separated. HTML mixes data description with data rendering, i.e. it has tags like <TITLE> mixed with tags such as <B>.
XML is more like SGML-lite (I am stretching the comparison here), and similar to the DSSSL technology used in SGML for document rendering, it has several options that the developer can use to generate a displayable document (CSS, XSL, etc.). XML allows us to create our own "tags" to better describe our data, and by using a DTD (a Document Type Definition) that describes the structure and valid content of our documents, we can easily perform validations on a document using general tools.
When we define our own tags (a DTD), we are making an "XML application", i.e. we are applying the rules of the XML specification to define a particular class of documents. The HTML DTD could be considered in this context an XML application, except that it does not comply in toto with strict XML. A better example would be the DocBook XML DTD, which is being used to create documentation in several Open Source projects (the PHP manual, the Linux Documentation Project, etc.)
As a first approximation we can use XML for a creating letters, chapters of books, articles, etc. Or we can use it in the context of data storage and retrieval (an XML based database for example). In this article I will focus on using XML documents for data encapsulation. Wrapping our data in XML clothes will allow for different applications, running in separate servers, to be able to pass complex information back and forth over HTTP connections.
As A First Approximation Enter WDDX
We will begin by making use of the WDDX functions that PHP offers for free, no need to create our own XML parser or even know the WDDX DTD, we can just use them and be happy.
The WDDX functions do not need any external library, and implement methods to generate, serialize and deserialize information into WDDX packets.
WDDX, the Web Distributed Data Exchange, is an XML application that:
"... is a mechanism for exchanging complex data structures between application environments. It has been designed with web applications in mind. WDDX consists of a language and platform neutral representation of instantiated data based on XML 1.0 (which is defined using this DTD) and a set of serializer/ deserializer components for every environment that uses WDDX. The process of creating an XML representation of application data is called serialization. The process of instantiating application data from a WDDX XML representation is called deserialization. "
(Quoted from: "WDDX Document Type Definition (DTD)", http://www.wddx.org/DTD.htm)
WDDX has a simple DTD, and can be used to serialize variables of different types. The DTD recognizes variables of type string, numeric, and boolean, as well as arrays (when the indexes are numeric) and structures (when the indexes are strings, also called associative arrays). Complex data structures can also be represented (e.g. arrays of arrays).
The main steps involved in using WDDX are: data serialization, packet creation, and data reconstruction (deserialization). The WDDX functions in PHP provide 2 functions for packet creation, 3 for variable serialization and one for deserialization. We will discuss them in that order:
Used to start a new WDDX packet for incremental addition of data. Returns a packet identifier to be used with wddx_packet_end or any of the serialization functions. This function takes an optional parameter to be used as comment in the packet, and initializes it.

Ends a WDDX packet specified by the identifier and returns the string representation of it. WDDX packets are designed to be used for application to application communication mainly, therefore there is no special formatting (such a indentation or line breaks) between the different XML elements.
Generates the serialized representation of a single value. This function accepts a variable to be serialized and and optional comment, it creates a new packet containing the indicated variable. This will be a equivalent of starting, adding and then ending a packet, when only one variable needs to be serialized.
For example:


"I was serialized on 2000-03-05";

will output:

<wddxPacket version='0.9'><header comment='singleton'/><data>
<string>I was serialized on 2000-03-05</string></data></wddxPacket>
Generates a WDDX packet containing the listed variables. It is similar to wddx_serialize_value, but applied to a set of variables. It returns the string representation of the WDDX packet. For example, if we have some variables with the name and information and product purchased by a client:


"John Q. Public";
$info = array    (
"address"=>"234 West 4th St. Apt 543",
"city"=>"New York City",
$products = array ("56Kbps modem X-9""Thunder 11 graphic card");


The serialized packet will be (manually formatted for clarity):

  <wddxPacket version='0.9'>
      <var name='name'>
    <string>John Q. Public</string>
      <var name='info'>
      <var name='email'>
      <var name='address'>
        <string>234 West 4th St. Apt 543</string>
      <var name='city'>
        <string>New York City</string>
      <var name='state'>
      <var name='zip'>
      <var name='products'>
    <array length='2'>
      <string>56Kbps modem X-9</string>
      <string>Thunder 11 graphic card</string>
Used to add one or more variables to a packet created using wddx_packet_start, and requires the packet identifier returned by that function. This function is used to incrementally add to an existing packet. See the listing for wddx_generator.php3 below for an example on the usage of this function.

Deserializes the incoming packet, returning a variable of mixed type. You should always test the type of the variable created by calling this function. In general if the WDDX packet contained more than one variable, an associative array will be returned in which the array keys will be the name of the serialized variables. See the listing for wddx_consumer.php3 below for an example on the usage of this function.

A WDDX Example

We will demonstrate how we can pass complex data structures between two PHP scripts that are residing in different servers. A simple schema on how these scripts work can be seen in the figure below.
Simple Schema Apache/Xitami
The "consumer" script running in his own server (a machine running Xitami, with PHP as a regular CGI interpreter), can connect via the server or directly (e.g. if called by using: % php -q wddx_consumer.php3), to another server (Apache with modPHP) in which the "generator" script resides. The "consumer" makes a request, and the "generator" sends back a WDDX pack which is then parsed and used.

The Data Generator (The WDDX Server)

This script shows how to construct a packet incrementally, and makes use of database functions, as well as a histogram class ("class.hist"). For the sake of this example, we will search for zinc-histidine distances, and then construct an array containing the resulting data set. In you own application, you may be obtaining, for example, daily maximum temperatures, or closing stock prices, or any quantitative value you would like to analyze.



// perform a search on the MDB looking for Zn-His distances
$link mysql_pconnect();
$query "select metal_lig_dist from ligand where metal='zn' and lig_symbol='his'";
$result mysql_query($query$link);
while (
$row mysql_fetch_row($result)) {
$data[] = $row[0];

After we gathered the information, we use the "Histogram" class (contained in "class.hist"), to allocate the data into 15 bins. Then, we retrieve these bins and the data statistics (both being associative arrays) into the variables $bins and $stats.


// generate a histogram with 15 bins
$hist = new Histogram($data15);

// get back the bins and statistics
$bins $hist->getBins();
$stats $hist->getStats();

Finally we create the WDDX packet. For this example we have chosen to create the packet incrementally. We could also have used a the function wddx_serialize_vars to create and output the packet in one step.


//create a WDDX packet and serialize the histogram information.
$packet_id wddx_packet_start("Results of histogram calculation of Zn-His distances");
$packet_out wddx_packet_end($packet_id);

That is all, we have done a SQL query, processed the data, and generated a packet in less than 20 lines!

The Data Consumer (The WDDX Client)

This script will invoke the WDDX packet generating script using an HTTP call (using the file function, then deserializes the returned packet (we use the first line in the returned array, because the packet is sent back as a continuous stream).

 * Script that "uses" the WDDX packet generated by
 * wddx_generator.php3
 * Jesus M. Castagnetto, 1999-2000

// get data as a WDDX packet
$url_gen "http://generator.server.com/wddx/wddx_generator.php3";
$packet_in implode("",file($url_gen));

// deserialize the values
$values wddx_deserialize($packet_in);

The variable $values will contain an associative array with keys corresponding to the names of the variables serialized by the WDDX generating script (in this example: "bins" and "stats"). We assign these variables (in themselves arrays), and then process them. In this example, we just display them in a nice table.


// extract the bins and stats arrays
$bins $values["bins"];
$stats $values["stats"];

// print a table of histogram bins and statistics.
echo "<TABLE><TR><TH>Histogram bins</TH><TH>Statistics</TH></TR>";
"<TR VALIGN='top'><TD>";
"<TABLE BORDER><TR><TH>Bin value</TH><TH>Count</TH></TR>";
while (list(
$k,$v) = each ($bins)) {
"<TR><TD ALIGN='center'>".sprintf("%.3f",$k)."</TD><TD>".$v."</TD></TR>\n";
while (list(
$k,$v) = each ($stats)) {
"<TR><TD ALIGN='right'>".$k." = </TD><TD>".sprintf("%.3f",$v)."</TD></TR>\n";

The output of the script above will then be:
Histogram Bins and Stats
This shows how easy it is to create scripts that can serve WDDX packets to remote applications. These remote applications can be another PHP script (as in the example), or perhaps a Java servlet or application, or a Python or Perl script that understands WDDX packets. This type of application to application communication is the backbone of, for example, EDI systems (Electronic Data Interchange) which are widely uses in business to business transactions, although the EDI specs do not make use of XML, yet.


In this article we have discussed the use of PHP to create and handle XML documents, in the form of WDDX packets. PHP handles WDDX packets well and without much coding, because it includes support in the core language. If you will only use one XML application: WDDX, then that is all the XML processing you need.
But because in real life you will need to deal with a variety of XML document types, PHP also provides (by using the Expat library) functions to define your own parsing functions, giving you an astonishing degree of flexibility. The examples here should be but the starting point in your quest for processing XML data from different sources, and I hope that even in their simplicity, they will serve as inspiration for your own creations.
In the following articles we will be discussing the creation and use of XML parsers, and the use of XML to something more than data encapsulation: XML-RPC.

Suggested Reading

(1) The book I refer here is "Professional PHP Programming" (ISBN: 1861002963, December 1999; Copyright © 1999 by Wrox Press Limited), which I co-authored along with other (better) writers.