Introduction:

In part 1, the article discussed document based searches that display results based on the number of search words found in each document. This article is an extension that ranks based on number of search words found plus number of occurrences of each search word in the document.
To search for “php tutorials and examples”, the following table shows the title and occurrence of each search word in the document. Common words like is, was, and etc. are removed from the search constraints by the program. So in this example, we have three search words, ‘php, ‘tutorials’ and ‘examples’.
No.Article NumberphpTutorialExamplesTotal OccurrenceRank
1.Article #189151116423
2.Article #20325128451
3.Article #25718165394
4.Article #1456817315
5.Article #52651721432
6.Article #8614410286
Article #203 has the highest occurrence and it is given rank 1. Similarly ranking is given for other results.
Building The Database:
The database consists of three tables. Document Table, Keyword Table and Link Table. Document Table holds article’s title, and abstract. Keyword Table holds keyword and the keyword field is indexed. Link Table holds keyword id, content id, and occurrences.
The SQL Statement for creating these three tables are shown below.
Content Table:
CREATE TABLE content ( 
contid mediumint NOT NULL auto_increment, 
title text NOT NULL, 
abstract longtext NOT NULL, 
PRIMARY KEY (contid) 
) TYPE=MyISAM; 
Keyword Table:
CREATE TABLE keytable ( 
keyid mediumint NOT NULL auto_increment, 
keyword varchar(100) NOT NULL,
PRIMARY KEY (keyid), 
KEY keyword (keyword) 
) TYPE=MyISAM; 
Link Table:
CREATE TABLE link ( 
keyid mediumint NOT NULL, 
contid mediumint NOT NULL,  
occurances mediumint NOT NULL 
) TYPE=MyISAM; 
Preparing Database:
The upload engine parses each word in the abstract and processes the whole text. It removes common words like ‘is’, ‘was’, ‘and’, ‘that’ … In Part 1, duplicate words are removed. Here every duplicate word is counted as an occurrence. The $wordMap array is an associative array that holds words and the number of occurrences.
Next, for every word in $wordMap array, the keyword table is searched. If a match is found it stores the generated key id and occurrences content id in the link table or else the new keyword is inserted in the keyword table. The link table is updated with occurrences, content id and the newly generated key id.
FormWordList() Function:
This is the core part of the program. This function is called after the ExtractWords() function. This parses filtered words and removes common words like ‘a’,’is’,’was’,’and’…. Other words are taken as valid words. An associative array $wordMap which stores the word and the number of occurrences in the document.

<?php
function FormWordList$wordList ) {
    global 
$COMMON_WORDS;
    global 
$MAX_WORD_LENGTH;

    
$wordMap = array();

    foreach ( 
$wordList as $word ) {
        
$len strlen$word );
        if ( (
$len 1) && ($len $MAX_WORD_LENGTH) ) {
               if ( !
$COMMON_WORDS[$word] ) {
                   if ( !
$wordMap[$word] ) {
                       
$wordMap[$word] = 1;
                   }else{
                       
$wordMap[$word]++;
                   }
               }
        }
    }
    return 
$wordMap;
}
?>
Every word in $wordList is checked to see if it is a common word. If TRUE the loop continues with the next word, or else it is checked for 'already exist' in the $wordMap associative array. If FALSE, the word is added in $wordMap with 'occurrence count 1'. Otherwise, the occurrence count is incremented by 1.
ProcessForm Function():
The code is similar to Part 1 coding, only here the occurrence count is added in link table along with key id an content id. Here is the code.

<?php
while(list($word,$occurances)=each($wordList)){
        
$keyId "";
        if ( !
$allWords[$word] ) {
            
mysql_querysprintf"INSERT INTO keytable ( keyword ) VALUES ( '%s' )",
                
mysql_escape_string($word) ) );

            
$keyId mysql_insert_id();
            
$allWords[$word] = $keyId;
        }
        else {
            
$keyId $allWords[$word];
        }

        
// insert the link
        
mysql_querysprintf"INSERT INTO link (keyid, contid, occurrences) 
                               VALUES ( %d, %d, %d)"

                                       
$keyId$contentId,$occurances ) );
    }
?>
Search Engine:
As discussed in the Introduction part, here the search is performed with number of occurrences in each document. Here is the code.

<?php
while($lRow=mysql_fetch_array($lResult)){
        
$thisContentId=$lRow["contid"];
        if(!
$contArray[$thisContentId]){
            
$contArray[$thisContentId]["oc"]=$lRow["occurances"];
            
$contArray[$thisContentId]["id"]=$lRow["contid"];
            
$contArray[$thisContentId]["wrank"]=1;
        }else{
            
$contArray[$thisContentId]["oc"]+=$lRow["occurances"];
            
$contArray[$thisContentId]["wrank"]++;
        }
}
?>
For every record in the results of the link table, the content id and number of occurrences is stored in an associative array $contArray. During 'while loop operation', if the content id already exists in $contArray, the occurrence is incremented with this new occurrence value.
Now $contArray is set and it shows that some results are found in the database table. Otherwise, the program skips to the next part that displays the result “NO RESULTS FOUND”

<?php
if(isset($contArray)){
    
//declare an array to store the results
    
$FoundRef=array();
    
//Sort array in desending order of the key value
    
arsort($contArray,SORT_DESC);
    
//Store the results in the $FoundRef Array 
    //code for this is given in the next line.
}
?>
In the next step we have to fetch title, the first 200 words in content table, into an array $FoundRef.

<?php
foreach($contArray as $cont){
   
$rank=$cont["wrank"];

   if (
$rank == $noofSearchWords ) {
          
$contentId $cont["id"];
          
$occurances $cont["oc"];
          
$aQuery "select contid,title,left(abstract,200) as summary from content where contid = " $contentId;
          
$aResult mysql_query($aQuery);

         if(
mysql_num_rows($aResult) > 0){
                   
$aRow mysql_fetch_array($aResult);
                   
$FoundRef[] = array (
                              
"contid" => $aRow["contid"],
                              
"title" => $aRow["title"],
                              
"summary" => $aRow["summary"],
                              
"occurance"=>$occurances );
         }
//end of  if
//end of for each
?>
Finally we have to display the results in the browser. Here is the code.

<?php
if(isset($FoundRef)) 
{
    echo 
"<table width=\"100%\"><tr><th class=\"title\">Search Result</td></tr></table>";
    echo 
"<br />";
    echo 
sizeof($FoundRef);
    echo (
sizeof($FoundRef) == " reference" " references");
    echo 
" found";
    if(
$junkWords)
    {
        echo 
"Common words like";
        foreach(
$junkWords as $jWords)
        {
            echo 
"&nbsp"."'".$jWords."'";
        }
        echo 
"are removed from the search string";
    }
    echo 
"</h5>";
    foreach(
$FoundRef as $a => $value)
    {
        echo 
"<table>";
        echo 
"<tr><td valign=\"top\">";
        
// echo $FoundRef[$a]["contid"];

        
<a href=showref.php?refid=<?php echo $FoundRef[$a]["contid"]?>><emp><b>
        <?php echo $FoundRef[$a]["title"]?></b></emp></a><div align="right">
        Occurance(s): 
        <?php echo $FoundRef[$a]["occurance"?></div>
        <br /><small>
        <?php echo $FoundRef[$a]["summary"?>...</small>
        <br /><br />
        <?php echo "</td></tr>";
    }
?>

    <?php echo "</table>";
}
//end of isset FoundRef
?>
Timer to Calculate the Time Taken to Search the Documents:
You can include a timer that calculates the time period to do the search operation. Here is the code.

<?php
//START TIMER
$end=getmicrotime();

//PERFORM SEARCH OPERATION

//END TIMER
$end=getmicrotime();

//TOTAL TIME TAKEN TO DO SEARCH OPERATION
$time_taken=(float)($end-$start);
$time_taken=number_format($time_taken,2,'.','');
?>
The following function calculates the time in microseconds.

<?php
function getmicrotime()
{
    list(
$usec,$sec)=explode(" ",microtime());
    return ((float)
$usec+(float)$sec);
}
?>
Conclusion:
Thus we come to an end of Document Based Search that displays results based number of search words found plus the number of occurrence of each search word in each document.
I implemented this technique after several optimizations to reduce the search time. I also tested this technique over 60000 distinct documents. Initially the search time was around 23.35 seconds and on consequent optimizations the search time was reduced to 10.89 seconds, 3.56 seconds and finally to 0.71 seconds. Also note that the search time varies with the hardware setup. I welcome comments on this article to optimize the performance further.

Source Code:

Upload.php
 
Search.php