php3-list | 2000051
Date: 05/08/00
- Next message: Sasa Danicic: "[PHP3] Which web server for PHP"
- Previous message: Manuel Lemos: "Re: [PHP3] Encapsulation"
- Next in thread: Michael Dearman: "Re: [PHP3] HTML Parsing - No XML thingy, here..."
- Reply: Michael Dearman: "Re: [PHP3] HTML Parsing - No XML thingy, here..."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I'm struggling with this a bit...
It is ALMOST elegant.
Perhaps if I knew my regular expressions better...
Anyway, I call this "odd.php3" and there's a BROKEN test case attached
It WILL FIND, oddly, "FORM METHOD=POST in the TEXT of a document if it starts
right after a tag... dangnabit ! But good enough for now !
-------------------------------------------------------------------------
<HTML><HEAD><TITLE>ODD</TITLE></HEAD> <BODY>
<br>
<?
$tags = " !-- !DOCTYPE A ADDRESS APPLET AREA B BASE BASEFONT BGSOUND BIG BLINK H1 H2 H3 H4 H5 H6";
$tags .= " BLOCKQUOTE BODY BR BUTTON CAPTION CENTER CITE CODE COL COLGROUP DD DEL DIV DL DT EM EMBED FIELDSET";
$tags .= " FONT FORM FRAME FRAMESET HEAD HR HTML I IFRAME IMG INPUT INS KBD LABEL LAYER LEGEND LI LINK MAP";
$tags .= " MARQUEE META NOBR NOFRAMES NOSCRIPT OBJECT OL OPTGROUP OPTION P PRE Q S SAMP SCRIPT SELECT SMALL SPAN STRIKE";
$tags .= " STRONG STYLE SUB SUP TABLE TBODY TD TH TEXTAREA TFOOT THEAD TITLE TR TT U UL WBR ";
$formtags = " FIELDSET FORM INPUT SELECT TEXTAREA ";
// Imperfectly decide if it's a tag or TEXT
// ----------------------------------------
function imperfectValidTagTest($whatnow) {
global $tags;
if (substr($whatnow, 0, 1) == "/" ) {
$whatnow = substr($whatnow, 1, strlen($whatnow) - 1);
}
// put spaces around FINDME so we don't get false positives
// when we use the strstr() function, we want WHOLE word matches
// not like "I" in "CAPTION" matching the <I> tag...
$whatnow = " " . $whatnow . " ";
$bob = strstr($tags, $whatnow);
if ( $bob == false) { $retval = false; }
else $retval = true;
return $retval;
}
// ========================================
// Take $what and UPPERCASE all ATTRIBUTE NAMES
// --------------------------------------------
function normalized($what) {
$attribray = split('[ =]', $what);
// echo "<b>" . $what . "</b><br>";
$bad = false;
echo "<TR>";
for ($x=0; $x<(count ($attribray)); $x++) {
if (substr($attribray[$x],0,1) != '"') {
$tmp = strtoupper($attribray[$x]);
$attribray[$x] = $tmp;
if ($x == 0) {
if (imperfectValidTagTest($attribray[$x]) == false) {
$bad = true;
break;
} else {
echo "<td>$attribray[$x]</td><td>";
}
}
}
if ($x > 0) {
echo " [" . $attribray[$x] . "]";
}
}
if ($bad == true) {
$OhThis = "META NAME=\"TEXT\" " . "[" . $what . "]";
echo "<td colspan=2>$OhThis</td>";
}
echo "</TR>";
// print ("$OhThis<br>");
// echo "<hr>";
return $OhThis;
}
// ============================================
// MAIN
// ---------
$file = fopen("html.html", "r");
if (!$file) {
echo "<p>Unable to open file.\n";
exit;
}
$line = fread($file, 65535);
fclose($file);
$parsedarray = split("[\<\>]", $line);
echo "<TABLE>";
for ($x=0; $x<(count ($parsedarray)); $x++) {
$parsedarray[$x] = Chop($parsedarray[$x]);
if ($parsedarray[$x] != "") {
$bob = normalized($parsedarray[$x]);
}
}
echo "</TABLE>";
// ========== end MAIN =============================
?>
</FORM>
</BODY>
</HTML>
------------------------------------------------------------------
And the broken test case...
-------------------------------------------
<html><head><title>boo</title></head>
<body>
<FORM METHOD="POST" action="http://localhost/">
<h1>CUSTOMER PROFILE and let's make this quite a bit bigger shall we ???</h1>
<h1>FORM METHOD=POST</h1>
<input type="text" name="bob" value="testing">
<input type="text" name="bob1" value="testing">
<input type="text" name="bob2" value="testing">
<input type="text" name="bob3" value="testing">
<input type="text" name="bob4" value="testing">
<input type="text" name="bob5" value="testing"></FORM>
<p>What happens with <i>regular</i> text ???</p>
</body>
</html>
-------------------------------------------
Any regex gurus know how to get past some of the dangnabit ?
THX
-AEF
-- "If you think the Universe is big, you should see the source code..." -Frank & Ernest-- PHP 3 Mailing List <http://www.php.net/> To unsubscribe, send an empty message to php3-unsubscribe <email protected> To subscribe to the digest, e-mail: php3-digest-subscribe <email protected> To search the mailing list archive, go to: http://www.php.net/mailsearch.php3 To contact the list administrators, e-mail: php-list-admin <email protected>
- Next message: Sasa Danicic: "[PHP3] Which web server for PHP"
- Previous message: Manuel Lemos: "Re: [PHP3] Encapsulation"
- Next in thread: Michael Dearman: "Re: [PHP3] HTML Parsing - No XML thingy, here..."
- Reply: Michael Dearman: "Re: [PHP3] HTML Parsing - No XML thingy, here..."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

