Click to See Complete Forum and Search --> : Link (and other file) tree maker?


kilo dB
07-22-2008, 12:45 PM
Looking for a script which, when given a page, will extract all links that exist in the same domain and present then as a tree or even a sorted list (a way to choose either would be good). I expect that this sort of thing exists and I just don't know what to search for (one google search I tried returned 22.5 Million)! I need help before I'm tempted to write it myself, and I'm sure that would be a wasted effort!
One other desireable feature is the location of "open" (sub)directories, ones that return a listing instead of an index.xxx file.

The need is to analyze a potential customer's site for hierarchy and complexity (number of pages, etc.) as input to an unsolicited proposal.

This must be the type of code behind search engines, so maybe there is some sort of php do it yourself search engine script out there?

bpat1434
07-23-2008, 08:39 AM
You mean a site-index or sitemap generator?

PHP Class for Google Sitemap generation (http://www.idealog.us/2006/09/php_class_for_c.html)
Google Sitemap generator (http://enarion.net/google/phpsitemapng/)

Both should get you on your way. Essentially the basics of it are to open the page and loop through the <body> and look for <a> tags and extract them. Then for each of them follow them and look for new links and repeat.

You could also go look at phparch.com's forums as they just had a contest (in Feb) which would crawl a page and extract "valid" links. So there's some code examples there.