Recently, at work I was given the job of learning XML. OK, so it wasn't technically
XML, it was RDF, but I found that PHP's XML parsing functions worked just the same.
At work I parsed out DMOZ (http://www.dmoz.org) but for simplicity I will stick with
the basics of XML and then leave you free to parse DMOZ in your extra time ;o)
To begin with you will want to make sure that you have a PHP binary compiled with
the '--with-xml' option enabled. Once that is complete you are ready to start parsing
XML. Next grab Slashdot's XML file from their homepage (www.slashdot.org/slashdot.xml).
Slashdot has a fairly simplistic file that is extremely easy to parse.
Remember that when you are working with XML it is a lot like working with a table in
a database. You have a result index in the xml parser and a psuedo table in the XML
document. Once you get over the differences you will be parsing in no time.
PHP's XML functions allow you to specify three functions that will handle the data in
the XML file. One handles opening tags, one hands the data between tags, and the
third handles the ending tags. Based on the name of the tags it gets passed you can
then manipulate the data however you please. To begin with you need to look at your
XML document and find out what tags are in the file. In our slashdot file we have
STORY, TITLE, URL, TIME, AUTHOR, DEPARTMENT, TOPIC, COMMENTS, SECTION, and IMAGE.
In some cases you would have attributes, and example is HREF is an attribute to A in
HTML. PHP has an extremely cool way of handling attributes automagically. Next we
need to define those tags in our script.
I only want to parse out the above data because I just want to make one of those
cool Slashboxes. Next on our list is to make the functions that will extract this
data. On the following page are the functions that I created to do so.