PHPBuilder - Dynamic Document Search Engine - Part 2 Page 2



RSS Twitter
Articles Tricks And Hacks

Dynamic Document Search Engine - Part 2 - Page 2

by: M.Murali Dharan
|
February 25, 2004

Preparing Database:
The upload engine parses each word in the abstract and processes the whole text. It removes common words like ‘is’, ‘was’, ‘and’, ‘that’ … In Part 1, duplicate words are removed. Here every duplicate word is counted as an occurrence. The $wordMap array is an associative array that holds words and the number of occurrences.
Next, for every word in $wordMap array, the keyword table is searched. If a match is found it stores the generated key id and occurrences content id in the link table or else the new keyword is inserted in the keyword table. The link table is updated with occurrences, content id and the newly generated key id.
FormWordList() Function:
This is the core part of the program. This function is called after the ExtractWords() function. This parses filtered words and removes common words like ‘a’,’is’,’was’,’and’…. Other words are taken as valid words. An associative array $wordMap which stores the word and the number of occurrences in the document.

<?php
function FormWordList$wordList ) {
    global 
$COMMON_WORDS;
    global 
$MAX_WORD_LENGTH;

    
$wordMap = array();

    foreach ( 
$wordList as $word ) {
        
$len strlen$word );
        if ( (
$len 1) && ($len $MAX_WORD_LENGTH) ) {
               if ( !
$COMMON_WORDS[$word] ) {
                   if ( !
$wordMap[$word] ) {
                       
$wordMap[$word] = 1;
                   }else{
                       
$wordMap[$word]++;
                   }
               }
        }
    }
    return 
$wordMap;
}
?>
Every word in $wordList is checked to see if it is a common word. If TRUE the loop continues with the next word, or else it is checked for 'already exist' in the $wordMap associative array. If FALSE, the word is added in $wordMap with 'occurrence count 1'. Otherwise, the occurrence count is incremented by 1.

« Previous Page
1
|
2
|
3
|
4
|
5
|
6
Next Page »

Comment and Contribute

Your comment has been submitted and is pending approval.

Author:
M.Murali Dharan

Comment:



Comment:

(Maximum characters: 1200). You have characters left.