PHPBuilder - Meta Tag Engine



RSS Twitter
Tips Php Functions

Meta Tag Engine

by: Brian Douros
|
November 25, 2000

Meta Tag Search Engine with Regular Expressions

Have you ever used the get_meta_tags() function included with php? Do you even know it exists? Well if your interested, the get_meta_tags() function returns all meta tag information from a given file as an array, where the key values of the array are the names of each meta tag respectably. This function is useful if you want to create your own search engine, like yahoo or altavista, where a user adds his/her URL to your search engine. Upon submission you can return the title, keywords, and description for them by querying their page. At that point you could store the data in a database and then allow searching of those keywords, descriptions, etc?. For example you could use the following line of code to return the description and keywords from http://www.phpbuilder.com.


<?php
$url = ?http://www.phpbuilder.com?;
$metatag = get_meta_tags($url, 1);
// where ?1? will result in PHP trying to open the file along the standard include path.

echo ?

Description

:?; echo ?

$metatag[description]

?; echo ?

Keywords:

?; echo ?

$metatag[keywords]

?;

If you have used this function before you may have been frustrated with the get_meta_tags() function not returning the correct meta information for some reason or other. Mostly I have found that if the author of the page included any ?tab?, ?return?, or ?newline? characters within the meta tag, the contents of the meta tag is not properly returned. Now try the same code above with ?http://www.zend.com? as the $url value. You will notice that the output of the keywords and description are incomplete.

As you can tell there are some minor inadequacies in the get_meta_tags() function. Most people would not care about these minor pitfalls but for all the frustrated ones there is a solution, or at least a function snippet.

This function queries an URL and returns the appropriate contents of the ?title?, ?keywords?, and ?description? as an array, where the keys of the array are the names respectably. Please note that this function is not fool proof but it does the job!!


function metaengine($url)
 {
  // Pattern for meta title
  $p_title[0] = '(<title>)([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
  $p_title[1] = '(<meta)([[:space:]]+)(name="title")([[:space:]]+)(content=")([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
  $p_title[2] = '(<meta)([[:space:]]+)(name=title)([[:space:]]+)(content=)([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';

  // Pattern for meta description
  $p_description[0] = '(<meta)([[:space:]]+)(name="description")([[:space:]]+)(content=")([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
  $p_description[1] = '(<meta)([[:space:]]+)(name=description)([[:space:]]+)(content=)([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
  
  // Pattern for meta keywords
  $p_keywords[0] = '(<meta)([[:space:]]+)(name="keywords")([[:space:]]+)(content=")([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
  $p_keywords[1] = '(<meta)([[:space:]]+)(name=keywords)([[:space:]]+)(content=)([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
  $p_keywords[2] = '(</head>)(.+)';
  
  // Fetch file into an array
  if(!($file = @file( $url, "r" )))
    {
      $keywords = 'Not Available';
      $description = 'Not Available';
      $title = 'Not Available';
    }
  else
    {
      // Turn array into a string using a space as the delimiter.
      $target = @implode( " ", $file);

      // Remove tab, return, and newline characters.
      $pat = "\n";
      $repl = " ";
      $target = ereg_replace($pat, $repl, $target);
      $pat = "\t";
      $repl = " ";
      $target = ereg_replace($pat, $repl, $target);
      $pat = "\r";
      $repl = " ";
      $target = ereg_replace($pat, $repl, $target);

      // Evaluate string with regular expression and find match for title.
      if(eregi($p_title[0], $target, $match))
        { 
          $title = $match[2];
        }
      elseif(eregi($p_title[1], $target, $match))
        {
          $title = $match[6];
        }
      elseif(eregi($p_title[2], $target, $match))
        {
          $title = $match[6];
        }
      else
        {
          $title = 'Not Available';
        }

      // Evaluate string with regular expression and find match for description.
      if(eregi($p_description[0], $target, $match))
        { 
          $description = $match[6];
        }
      elseif(eregi($p_description[1], $target, $match))
        {
          $description = $match[6];
        }
      else
        {
          $description = 'Not Available';
        }

      // Evaluate string with regular expression and find match for keywords.
      if(eregi($p_keywords[0], $target, $match))
        { 
          $keywords = $match[6];
        }
      elseif(eregi($p_keywords[1], $target, $match))
        {
          $keywords = $match[6];
        }
      // If no meta tag content is presend for keywords use document text as keywords
      // starting after the </head> tag.
      elseif(eregi($p_keywords[2], $target, $match))
        {
          //Remove HTML and PHP tags
          $match[2] = strip_tags($match[2]);
          //Strip white spaces before and after string
          $match[2] = trim($match[2]);
          //Limit size of string to 1000 characters starting at the 100th character
          $match[2] = substr($match[2], 100, 1100);
          $keywords = $match[2];
        }
      else
        {
          $keywords = 'Not Available';
        }
    }

  $metatag[title] = $title;
  $metatag[description] = $description;
  $metatag[keywords] = $keywords;
  
  return $metatag;
}

Now lets modify the original code to utilize the metaengine() function with http://www.zend.com as the $url.


<?php
$url = ?http://www.zend.com?;
$metatag = metaengine($url);

echo ?

Title

:?; echo ?

$metatag[title]

?; echo ?

Keywords:

?; echo ?

$metatag[keywords]

?; echo ?

Description

:?; echo ?

$metatag[description]

?; echo ?

Keywords:

?; echo ?

$metatag[keywords]

?;

You will see that the title, keywords, and description values are all correct and complete. I hope this function helps at least a few frustrated get_meta_tags() users, and of course the ones that have never used it.

Comment and Contribute

Your comment has been submitted and is pending approval.

Author:
Brian Douros

Comment:



Comment:

(Maximum characters: 1200). You have characters left.