Click to See Complete Forum and Search --> : DOM and xpath giving me troubles....


bpat1434
01-04-2007, 12:49 AM
I'm trying to get a short script working. Bascially, take an RSS 2.0 feed (specifically, one from a Trac bug tracking system) and use DOM to make an object of it, then use xpath to extract the one child that has the ticket ID of a specific value in it.

Here's what I've come up with, and I'm no genius at xpath, so the query is dead wrong.

$xml = new DOMDocument('1.0', 'utf8');
if(!@$xml->loadHTML($temp))
exit('Unable to load XML properly.');

$query = '///item[title::contains(self, "'.$ticketID.'")]';
$xpath = new DOMXpath($xml);
$items = $xpath->query($query);

foreach($items as $item)
{
$entries[] = array(
'link' => $item->previousSibling->previousSibling->nodeValue,
'guid' => $item->previousSibling->nodeValue,
'title' => $item->nodeValue,
'description' => $item->followingSibling->nodeValue,
'category' => $item->followingSibling->followingSibling->nodeValue,
'comments' => $item->followingSibling->followingSibling->followingSibling->nodeValue
);
}

Now, I can get the file contents fine, and create the document. One thing you may notice is that I'm using loadHTML instead of loadXML. If I use loadXML, I get some invalid characters in the XML. loadHTML won't break that.

Thanks for any help you can offer me. You can use the demo tickets XML RSS Feed (http://trac.edgewall.org/report/1?format=rss&USER=anonymous) to test it out on.

I know I need to look inside of the <title> child of every <item> until I find one that matches the basic format of: #TicketID: Ticket Title / Summary

Each <title> node will follow that format. I figure xpath is the way to go here. Any help would be appreciated.

Weedpacket
01-04-2007, 06:17 AM
I notice that the ticket ID is also used in the GUID - I don't know if that would be more reliable (I'm considering the possibility that something that looks like a ticket ID might pop up in the title for some other reason).

But the XPath query: item elements (anywhere in the document) that have a title element with a text node that contains the supplied string. Since it's the item elements you want, the rest would be part of a predicate.
//item[contains(title/text(), '$ticketID')]
I think that's about it. You'll want to adjust your DOM traversal so that it looks at the children of each item (getElementsByTagName) because I notice some variation about which elements have which other elements as siblings (some have author elements and some don't, for example).

bpat1434
01-04-2007, 10:25 AM
You'll want to adjust your DOM traversal so that it looks at the children of each item (getElementsByTagName) because I notice some variation about which elements have which other elements as siblings (some have author elements and some don't, for example).

Trying to get what you're saying here. I can keep the foreach loop (because otherwise I'd use a for() loop that goes until 1-$items->length) and just use $item->nodeValue to echo out the information. I get that.

What i don't get is how I can get the element's tag name. I'd like to easily create a small array that has the tag-name as the index, and the nodeValue as its value.

Nevermind, I apparently had a bad ticketID ;)

But still, I'm having minor issues. Once I get the node-list back . . .
1.) How would I got about traversing it?
2.) How can I get the tag Names so I can either get the info I want, or create an indexed array?

Still haven't figured those out. Obviously, the array would be easiest for me; however, since this information (once i get it) is going into a database, just getting the proper information will work as well. At this point, I'm lost....

Weedpacket
01-04-2007, 09:29 PM
Once you've got the item node, you could get the DOMNodeList of its childNodes. Then iterate over that, looking at localName and nodeValue properties.
Summat like:

$query = "//item[contains(title/text(), '$ticketID')]";
$xpath = new DOMXpath($xml);
$items = $xpath->query($query);

foreach($items as $item)
{
$children = $item->childNodes;
foreach($children as $child)
{
echo $child->localName,"\t=\t",$child->nodeValue,"\n";
}
}
(Incidentally, I tried this with ticket 4163 from that example, which happens to cite ticket 69 in the title; so searching for ticket 69 would have turned up both).

bpat1434
01-05-2007, 12:52 AM
That's it. Thanks weedpacket. Always helpful.

I also changed it from title to guid since (per the spec at least) guid has to be unique. Thanks for the help.