PHPBuilder - Parse html (title :: meta)



RSS Twitter
Snippets Html

Parse html (title :: meta)

by: Olaf Lederer
|
October 22, 2005

Version: 1

Type: Sample Code (HOWTO)

Category: HTML

License: BSD License

Description: With this script it's possible to obtain the first part of a remote file to parse the html elements in local script. The title element, the meta description and the meta keywords are parsed while using eregi() function. Additional meta elements are possible, by adding some extra rules and regex patterns. The script reads only the first part of a remote file, until the closing tag of the html head element is passed, the result is a better performance. The script will not follow url redirections. This script is very usefull in forms to add new links to a linklist. Check the demo here: http://www.finalwebsites.com/classes/examples/get_meta_data.php



<?php
$page_title = "n/a";
$meta_descr = "n/a";
$meta_keywd = "n/a";

if ($handle = @fopen("http://www.finalwebsites.com", "r")) {
    $content = "";
    while (!feof($handle)) {
        $part = fread($handle, 1024);
        $content .= $part;
        if (eregi("</head>", $part)) break;
    }
    fclose($handle);
    $lines = preg_split("/\r?\n|\r/", $content); // turn the content in rows
    $is_title = false;
    $is_descr = false;
    $is_keywd = false;
    foreach ($lines as $val) {
        if (eregi("<title>(.*)</title>", $val, $title)) {
            $page_title = $title[1];
            $is_title = true;
        }
        if (eregi("<meta name=\"description\" content=\"(.*)\">", $val, $descr)) {
            $meta_descr = $descr[1];
            $is_descr = true;
        }
        if (eregi("<meta name=\"keywords\" content=\"(.*)\">", $val, $keywd)) {
            $meta_keywd = $keywd[1];
            $is_keywd = true;
        }
        if ($is_title && $is_descr && $is_keywd) break;
    }
}
?> 

Comment and Contribute

Your comment has been submitted and is pending approval.

Author:
Olaf Lederer

Comment:



Comment:

(Maximum characters: 1200). You have characters left.