PHPBuilder - Slapping together a search engine



RSS Twitter
Articles Tricks And Hacks

Slapping together a search engine

by: Clay Johnson
|
July 30, 2000

So you've got a dynamic site, filled with all sorts of user inputs, whether it be a 'phorum', or like my own site at knowpost.com. The site htdig.org will take care of indexing and searching your html pages, but if you are like me, you have very few html pages, and must of your "content" resides in BLOBs in your database. You can't do anything useful using a like %searchword% query, it just isn't coming back relevant.
There has to be a better way, and indeed there is, with a few easy steps. Here's how to slap one together:
Part one: BNR--Blob Noise Reduction
The first problem with your content is that it is filled with clunky "noisewords," like "a,the,where,look" Things that are there to help us humans to communicate, but really don't have anything to do with relevance. We gotta get rid of those. I've included a big list of noisewords (noisewords.txt) for you to use, modify or mutilate. Essentially, what we're trying to do here is get all those noisewords out of your data, and build a table with two columns, the word, and its indicator (the content associated with it). We want something that will eventually look like this:

+------+------------+
| qid  | word       |
+------+------------+
|    6 | links      |
|    5 | Fire       |
|    5 | topics     |
|    5 | related    |
|    5 | Shakespeare|
|    4 | people     |
|    4 | Knowpost   |
|    3 | cuba       |
|    3 | cigar      |
+------+------------+
Lets create our table now--

CREATE TABLE search_table(
word VARCHAR(50),
qid INT)

1
|
2
|
3
|
4
|
5
Next Page »

Comment and Contribute

Your comment has been submitted and is pending approval.

Author:
Clay Johnson

Comment:



Comment:

(Maximum characters: 1200). You have characters left.