php-windows | 2001042
Date: 04/26/01
- Next message: Manesh: "[PHP-WIN] What is this??"
- Previous message: OoCobra97 <email protected>: "Re: [PHP-WIN] Bandwidth for webhosts"
- In reply to: DHEA: "[PHP-WIN] [Help:] Problem with regex patterns when getting Title, Description and Keywords from HTML files..."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Maybe you should look out for
if (eregi('<meta name="description" content="(.*)">', $doc,
to:
if (eregi("<meta name='description' content='(.*)'>",
$doc,...)||eregi("<meta name=\"description\" content=\"(.*)\">", $doc,...))
like the first:
if (eregi("<title>(.*)</title>", $doc, $titlematch))
but I don't know, just maybe :)
----- Original Message -----
From: "DHEA" <mpfa.madeira <email protected>>
To: <php-windows <email protected>>
Sent: Friday, April 27, 2001 12:49 AM
Subject: [PHP-WIN] [Help:] Problem with regex patterns when getting Title,
Description and Keywords from HTML files...
> Hello,
>
> I am trying to make a PHP script to index my site and insert into a
> MySQL DB the .htm files path, its Title (from the HTML tags
> <Title></Title>), its Description (from the meta tag <meta
> name="description" content="..."> ) and its Keywords (from the meta
> tag <meta name="keywords" content="..."> ).
>
> Well, I adapted this function to get the Title and it works great!!:
>
> /*
> * Given a raw html document (as string), return its title.
> * This function may need to be modified if your web pages use
> automatically
> * generated titles.
> */
>
> function getTitle(&$doc)
> {
> if (eregi("<title>(.*)</title>", $doc, $titlematch))
> $title = trim(eregi_replace("[[:space:]]+", " " ,
> $titlematch[1]));
> else
> $title = "";
> if ($title == "")
> $title = "Sem Título";
> return $title;
> }
>
>
> I then tried to do something similar to get the Description:
>
>
> function getDescription(&$doc)
> {
> if (eregi('<meta name="description" content="(.*)">', $doc,
> $descr))
> $descricao = trim(eregi_replace("[[:space:]]+", " " ,
> $descr[1]));
> else
> $descricao = "";
> if ($descricao == "")
> $descricao = "Sem Descrição";
> return $descricao;
> }
>
> This doesn't work as intended... It returns the whole page starting
> after content=" and doesn't end at the end of the string (">).
>
> The funny thing is that if I add a space on the end of the string like
> this (" >) in both the PHP code and in the HTML file (<meta
> name="description" conten="test with a space" >), the function returns
> only the string of the description as intended...
>
>
> The same thing happens with the Keywords:
>
> function getKeywords(&$doc)
> {
> if (eregi('<meta name="keywords" content="(.*)">', $doc,
> $mykeys))
> $keywords = trim(eregi_replace("[[:space:]]+", " " ,
> $mykeys[1]));
> else
> $keywords = "";
> if ($keywords == "")
> $keywords = "Sem Keywords";
> return $keywords;
> }
>
> But this time I nedded two (2) spaces to make the function work!!!
> (<meta name="description" conten="test with 2 spaces" >), If I used
> one or no space it returned the whole page... with 2 spaces the
> function works...
>
> I concluded that the regex pattern (.*) doesn't stops looking on the
> "> and needs a space between them (" >). But why the second time it
> nedded 2 spaces!?
>
> I don't want to have to change all the HTM files from my site and add
> a space to the Descritpion Meta Tag and 2 spaces to the Keywords
> Meta... Is there a way to say to the (.*) to end the search at the ">
> ?
>
> Thanks for your attention
>
> Marco Ascensao
>
>
>
> --
> PHP Windows Mailing List (http://www.php.net/)
> To unsubscribe, e-mail: php-windows-unsubscribe <email protected>
> For additional commands, e-mail: php-windows-help <email protected>
> To contact the list administrators, e-mail: php-list-admin <email protected>
>
>
-- PHP Windows Mailing List (http://www.php.net/) To unsubscribe, e-mail: php-windows-unsubscribe <email protected> For additional commands, e-mail: php-windows-help <email protected> To contact the list administrators, e-mail: php-list-admin <email protected>
- Next message: Manesh: "[PHP-WIN] What is this??"
- Previous message: OoCobra97 <email protected>: "Re: [PHP-WIN] Bandwidth for webhosts"
- In reply to: DHEA: "[PHP-WIN] [Help:] Problem with regex patterns when getting Title, Description and Keywords from HTML files..."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

