![]() Join Up! 96819 members and counting! |
|
|||
DOM XML: An Alternative to Expat
Matt Dunford
Overview: An alternative to expat.
There are many xml tutorials for php on the web, but few show how to
parse xml using DOM. I would like to take this opportunity to show
there is an alternative to the widespread SAX implementation for php
programmers.
DOM (Document Object Model) and SAX (Simple API for XML) have
different philosophies on how to parse xml. The SAX engine is
extremely event-driven. When it comes across a tag, it calls an
appropriate function to handle it. This makes SAX very fast and
efficient. However, it feels like you're trapped
inside an eternal loop when writing code. You find yourself using many global variables
and conditional statements.
On the other hand, the DOM method is somewhat memory intensive. It
loads an entire xml document into memory as a hierarchy. The upside
is that all of the data is available to the programmer organized
much like a family tree. This approach is more intuitive,
easier to use, and affords better readability.
In order to use the DOM functions, you must configure php by specifying
the '--with-dom' argument. They are not a part of the standard
configuration. Here is a sample compilation.
%> ./configure --with-dom --with-apache=../apache_1.3.12 %> make %> make install How DOM structures XML
Since DOM loads an entire xml string or file into memory as a tree,
this allows us to manipulate the data as a whole. To show what xml
looks like as a tree, take this xml document as an example.
<?xml version="1.0"?> <book type="paperback"> <title>Red Nails</title> <price>$12.99</price> <author> <name first="Robert" middle="E" last="Howard"/> <birthdate>9/21/1977</birthdate> </author> </book>
The data would be structured like this.
DomNode book | |-->DomNode title | | | |-->DomNode text | |-->DomNode price | | | |-->DomNode text | |-->DomNode author | |-->DomNode name | |-->DomNode birthdate | |-->DomNode text
Any text enclosed within tags are really nodes in themselves. For instance,
"Red Nails" is a child node of title, "$12.99" is a child node of
price.
The Objects Used In DOM
At this point, you are probably wondering what is a DomNode. This is
a good place to start talking about the objects that are included in
the module. There are five objects defined by DOM: DomDocument,
DomNode, DomAttribute, DomDtd, and DomNamespace. We are going to be
focusing primarily on the DomDocument and DomNode objects because they
are the most useful.
The Node object
Here is an overview of what the DomNode object contains.
class DomNode properties: name content type methods: lastchild() children() parent() new_child( $name,$content ) getattr( $name ) setattr( $name,$value ) attributes()
The properties need some elaboration.
The methods need to be explained, as well.
The DomDocument object
The DomDocument object is also important.
class DomDocument properties: version encoding standalone type methods: root() children() add_root( $node ) dtd() dumpmem()
The properties are pretty self explanatory.
The methods are pretty simple too.
The DomDocument Object Returned By xmltree()
Xmltree(), a function which I haven't introduced yet, returns a
type of DomDocument object which may give you trouble. This object
has no methods, just properties in place of methods. It has a true
tree structure to it.class DomDocument properties: version encoding standalone name content type attributes children
It is just as easy to use. For instance, instead of using a
method to get a node's children, just access its 'children' property.
'children' and 'attributes' are both arrays.
The Other Objects
I will list the other objects and their properties and methods just for
reference. We won't be dealing with them in this article.
class Attribute properties: name content methods: name() class Dtd properties: extid sysid name class Namespace Using the Objects
The DOM module only has three functions,
xmldoc(), xmldocfile(), and
xmltree(). The rest of the time, we will be dealing with the objects.
All functions return DomDocument objects. Here are examples of how
you load xml data into your php script:
All functions will throw an error, if the xml cannot be parsed
correctly. DOM will not validate xml for you. You must find another
way of doing that. Perhaps through another program like xmllint.
A Simple Example
Let's start with a simple example to tie everything together.
The example should print out the following:
position: Web Guy type: contract
The while loop is essential for finding the position node. The
employee node really has five children nodes: three text, one name,
and one position. The text nodes contain the newlines at the end of
the lines. This may seem strange at first, but DOM considers any
string (even those containing only whitespace) as text and makes an
appropriate node for them.
If you want to ensure that the employee node only has two child
nodes, you will have to write the xml entry like this .
<employee><name>Matt</name><position type="contract">Web Guy</position></employee> A Longer Example
Here is a longer example of how to extract info from an xml doc. For example,
we have a file called employees.xml containing employee entries.
<?xml version="1.0"?> <employees company="zoomedia.com"> <employee> <name>Matt</name> <position type="contract">Web Guy</position> </employee> <employee> <name>George</name> <position type="full time">Mad Hacker</position> </employee> <employee> <name>Wookie</name> <position type="part time">Hairy SysAdmin</position> </employee> </employees>
Here's how you would extract this info in your php script.
You should see the following in your browser.
Matt the Web Guy, contract employee George the Mad Hacker, full time employee Wookie the Hairy SysAdmin, part time employee Another example (adding data)
Since the xml is loaded into memory as a tree, we can easily
manipulate the data. We can add branches or nodes when necessary.
Say we want to add an employee to our xml file.
This will print the xml to the browser, so you will most likely have
to 'View the Source' in order to see the data.
Conclusion
That's pretty much all there is to DOM xml. It's a simple approach to
parsing and manipulating xml in your scripts. I hope this article
will shed more light in this dusty corner of php.
-- Matt
References
domxml functions for php
http://www.php.net/manual/ref.domxml.php
DOM reference
http://www.w3.org/TR/
libxml, an essential library for building dom
ftp://ftp.gnome.org/pub/GNOME/stable/sources/libxml/
Short intro on the difference between DOM and SAX
http://www.builder.com/Programming/XMLToday/ss01.html?tag=st.bl.7267.dir1.XMLToday_01
php domxml source
php-4.0.2/ext/domxml. |