I've had text lying around from when we were writing the book
(1) last year, and the people at Wrox gave they OK for
the information to be put as articles in PHPBuilder.com, so here it is the
first one of a series.
Introduction
The Extensible Markup Language (XML), is a metalanguage that allows the
definition of markup languages. It is not a markup language in itself, but you
can think of it as providing the construction rules for specialized languages
used for data description. XML is not the new generation HTML, it is more
oriented to be used in data applications in which the data and its rendition
are separated. HTML mixes data description with data rendering, i.e. it has
tags like <TITLE> mixed with tags such as <B>.
XML is more like SGML-lite (I am stretching the comparison here), and similar
to the DSSSL technology used in SGML for document rendering, it has several
options that the developer can use to generate a displayable document (CSS,
XSL, etc.). XML allows us to create our own "tags" to better
describe our data, and by using a DTD (a Document Type Definition) that
describes the structure and valid content of our documents, we can
easily perform validations on a document using general tools.
When we define our own tags (a DTD), we are making an "XML application", i.e.
we are applying the rules of the XML specification to define a particular class of
documents. The HTML DTD could be considered in this context an XML application,
except that it does not comply in toto with strict XML. A better example would be
the DocBook XML DTD, which is being used to create documentation in several
Open Source projects (the PHP manual, the Linux Documentation Project, etc.)
As a first approximation we can use XML for a creating letters, chapters of
books, articles, etc. Or we can use it in the context of data storage and
retrieval (an XML based database for example).
In this article I will focus on using XML documents for data encapsulation. Wrapping our data in XML
clothes will allow for different applications, running in separate servers,
to be able to pass complex information back and forth over HTTP connections.
As A First Approximation Enter WDDX
We will begin by making use of the
WDDX functions that PHP offers for free, no
need to create our own XML parser or even know the WDDX DTD, we can just use
them and be happy.
The WDDX functions do
not need any external library, and implement methods to generate, serialize
and deserialize information into WDDX packets.
WDDX, the Web Distributed Data Exchange, is an XML application that:
"... is a mechanism for exchanging complex data structures between application
environments. It has been designed with web applications in mind. WDDX
consists of a language and platform neutral representation of instantiated
data based on XML 1.0 (which is defined using this DTD) and a set of
serializer/ deserializer components for every environment that uses WDDX. The
process of creating an XML representation of application data is called
serialization. The process of instantiating application data from a WDDX XML
representation is called deserialization. "
(Quoted from: "WDDX Document Type Definition (DTD)",
http://www.wddx.org/DTD.htm)
WDDX has a simple DTD, and can be used to serialize variables of different
types. The DTD recognizes variables of type string, numeric, and boolean, as
well as arrays (when the indexes are numeric) and structures (when the indexes
are strings, also called associative arrays). Complex data structures can
also be represented (e.g. arrays of arrays).
The main steps involved in using
WDDX are: data serialization, packet creation, and data reconstruction
(deserialization). The WDDX functions in PHP provide 2 functions for packet
creation, 3 for variable serialization and one for deserialization. We will
discuss them in that order:
- wddx_packet_start
- Used to start a new WDDX packet for incremental addition of data.
Returns a packet identifier to be used with wddx_packet_end or any of the
serialization functions. This function takes an optional parameter to be used
as comment in the packet, and initializes it.
- wddx_packet_end
- Ends a WDDX packet specified by the identifier and returns the string
representation of it. WDDX packets are designed to be used for application to
application communication mainly, therefore there is no special formatting
(such a indentation or line breaks) between the different XML elements.
- wddx_serialize_value
- Generates the serialized representation of a single value. This function
accepts a variable to be serialized and and optional comment, it creates a
new packet containing the indicated variable. This will be a equivalent of
starting, adding and then ending a packet, when only one variable needs to be
serialized.
For example:
<?php
$one_var = "I was serialized on 2000-03-05";
echo wddx_serialize_value($one_var, "singleton");
?>
will output:
<wddxPacket version='0.9'><header comment='singleton'/><data>
<string>I was serialized on 2000-03-05</string></data></wddxPacket>
- wddx_serialize_vars
- Generates a WDDX packet containing the listed variables. It is similar to
wddx_serialize_value, but applied to a set of variables. It returns the string
representation of the WDDX packet. For example, if we have some variables with
the name and information and product purchased by a client:
<?php
$name = "John Q. Public";
$info = array (
"email"=>"jqpublic@someplace.nice.com",
"address"=>"234 West 4th St. Apt 543",
"city"=>"New York City",
"state"=>"NY",
"zip"=>"10003"
);
$products = array ("56Kbps modem X-9", "Thunder 11 graphic card");
echo wddx_serialize_vars("name", "info", "products");
?>
The serialized packet will be (manually formatted for clarity):
<wddxPacket version='0.9'>
<header/>
<data>
<struct>
<var name='name'>
<string>John Q. Public</string>
</var>
<var name='info'>
<struct>
<var name='email'>
<string>jqpublic@someplace.nice.com</string>
</var>
<var name='address'>
<string>234 West 4th St. Apt 543</string>
</var>
<var name='city'>
<string>New York City</string>
</var>
<var name='state'>
<string>NY</string>
</var>
<var name='zip'>
<string>10003</string>
</var>
</struct>
</var>
<var name='products'>
<array length='2'>
<string>56Kbps modem X-9</string>
<string>Thunder 11 graphic card</string>
</array>
</var>
</struct>
</data>
</wddxPacket>
- wddx_add_vars
- Used to add one or more variables to a packet created using
wddx_packet_start, and requires the packet identifier returned by that
function. This function is used to incrementally add to an existing packet.
See the listing for wddx_generator.php3 below for an example on the usage of
this function.
- wddx_deserialize
- Deserializes the incoming packet, returning a variable of mixed type. You
should always test the type of the variable created by calling this function.
In general if the WDDX packet contained more than one variable, an associative
array will be returned in which the array keys will be the name of the
serialized variables. See the listing for wddx_consumer.php3 below for an
example on the usage of this function.
A WDDX Example
We will demonstrate how we can pass complex data structures between two PHP
scripts that are residing in different servers. A simple schema on how
these scripts work can be seen in the figure below.
The "consumer" script running in his own server (a machine running Xitami, with
PHP as a regular CGI interpreter), can connect via the server or directly
(e.g. if called by using: % php -q wddx_consumer.php3), to another
server (Apache with modPHP) in which the "generator" script resides. The
"consumer" makes a request, and the "generator" sends back a WDDX pack which
is then parsed and used.
The Data Generator (The WDDX Server)
This script shows how to construct a packet incrementally, and makes use of
database functions, as well as a histogram class ("class.hist").
For the sake of this
example, we will search for zinc-histidine distances, and then construct an
array containing the resulting data set. In you own application, you may be
obtaining, for example, daily maximum temperatures, or closing stock prices,
or any quantitative value you would like to analyze.
wddx_generator.php3
<?php
require("class.hist");
// perform a search on the MDB looking for Zn-His distances
$link = mysql_pconnect();
mysql_select_db("metallodb");
$query = "select metal_lig_dist from ligand where metal='zn' and lig_symbol='his'";
$result = mysql_query($query, $link);
while ($row = mysql_fetch_row($result)) {
$data[] = $row[0];
}
?>
After we gathered the information, we use the "Histogram" class (contained in
"class.hist"), to allocate the data into 15 bins. Then, we retrieve these bins
and the data statistics (both being associative arrays) into the variables
$bins and $stats.
<?php
// generate a histogram with 15 bins
$hist = new Histogram($data, 15);
// get back the bins and statistics
$bins = $hist->getBins();
$stats = $hist->getStats();
?>
Finally we create the WDDX packet. For this example we have chosen to create
the packet incrementally. We could also have used a the function
wddx_serialize_vars to create and output the packet in one step.
<?php
//create a WDDX packet and serialize the histogram information.
$packet_id = wddx_packet_start("Results of histogram calculation of Zn-His distances");
wddx_add_vars($packet_id, "bins", "stats");
$packet_out = wddx_packet_end($packet_id);
echo $packet_out;
?>
That is all, we have done a SQL query, processed the data, and generated a
packet in less than 20 lines!
The Data Consumer (The WDDX Client)
This script will invoke the WDDX packet generating script using an HTTP call
(using the file function,
then deserializes the returned packet (we use the first line in the returned
array,
because the packet is sent back as a continuous stream).
wddx_consumer.php3
<?php
<HTML>
<HEAD>
<TITLE>WDDX_CONSUMER.PHP3</TITLE>
</HEAD>
<BODY BGCOLOR="white">
<?php
/*
* Script that "uses" the WDDX packet generated by
* wddx_generator.php3
* Jesus M. Castagnetto, 1999-2000
*/
// get data as a WDDX packet
$url_gen = "http://generator.server.com/wddx/wddx_generator.php3";
$packet_in = implode("",file($url_gen));
// deserialize the values
$values = wddx_deserialize($packet_in);
?>
The variable $values will contain an associative array with keys
corresponding to the names of the variables serialized by the WDDX generating
script (in this example: "bins" and "stats"). We assign these variables (in
themselves arrays), and then process them. In this example, we just display
them in a nice table.
<?php
// extract the bins and stats arrays
$bins = $values["bins"];
$stats = $values["stats"];
// print a table of histogram bins and statistics.
echo "<TABLE><TR><TH>Histogram bins</TH><TH>Statistics</TH></TR>";
echo "<TR VALIGN='top'><TD>";
echo "<TABLE BORDER><TR><TH>Bin value</TH><TH>Count</TH></TR>";
while (list($k,$v) = each ($bins)) {
echo "<TR><TD ALIGN='center'>".sprintf("%.3f",$k)."</TD><TD>".$v."</TD></TR>\n";
}
echo "</TABLE>\n";
echo "</TD><TD>";
echo "<TABLE>";
while (list($k,$v) = each ($stats)) {
echo "<TR><TD ALIGN='right'>".$k." = </TD><TD>".sprintf("%.3f",$v)."</TD></TR>\n";
}
echo "</TABLE>\n";
echo "</TD></TR></TABLE>";
?>
The output of the script above will then be:
This shows how easy it is to create scripts that can serve WDDX packets to
remote applications. These remote applications can be another PHP script
(as in the example), or
perhaps a Java servlet or application, or a Python or Perl script
that understands WDDX packets.
This type of
application to application communication is the backbone of, for example, EDI
systems (Electronic Data Interchange) which are widely uses in business to
business transactions, although the EDI specs do not make use of XML, yet.
Summary
In this article we have discussed the use of PHP to create and handle
XML documents, in the form of WDDX packets.
PHP handles WDDX packets well and without much coding, because it includes
support in the core language.
If you will only use one XML application: WDDX, then that is all the XML
processing you need.
But because in real life you will need to deal with a
variety of XML document types, PHP also provides (by using the Expat library)
functions to define your own parsing functions, giving you an astonishing
degree of flexibility. The examples here should be but the starting point in
your quest for processing XML data from different sources, and I hope that
even in their simplicity, they will serve as
inspiration for your own creations.
In the following articles we will be discussing the creation and use of XML
parsers, and the use of XML to something more than data encapsulation:
XML-RPC.
Suggested Reading