|
Comments for: clay19990421
| Message # 1014499: |
|
Date: 12/03/02 13:27
By: /zureash Subject: RE: database searching Just a quick hint- you should be using preg_replace as it is faster than ereg_replace. Also, you don't need to repeat the replace code over and over, you can set up an array of patterns and replacements, for example: $pattern=array( '\n', '(', ')' ); $replace=array( '' ); $filtered=preg_replace( $pattern, $replace, $blobtext ); Also, if you have a clear pattern that you are looking for (such as alpha characters) an easier way would just be to use a regular expression to catch any non-alpha characters, for example: $filtered=preg_replace( '/W/', '', $blobtext ); As a side note, TEXT columns in MySQL are NOT case-sensative, while BLOB columns are case-sensative. This difference explains why the code snippet in the article is NOT case-sensitive. Finally, this is a pretty heavy app (resource-wise) to be running on your server. I am not convinced that this is the best approach to indexing and searching a db-powered site. However, if you are going to use this, make sure you create an index of your key word table in your database. An index will greatly speed up the matching in the database. (See the MySQL documentation if you don't know how to do this.) Also, you should NOT run the indexing portion of this script (the part that creates the keyword table) too frequently. This is exactly the kind of utility that cron jobs are made for. Your best bet is to set a cron job to run this utility once a day (preferably during the night, or during periods of lowest traffic on your site.) I used to work for a major search engine company, so I can tell you that this is NOT how indexes like Google are created. Major search engines use compiled indexes (and no, the correct word IS NOT indices!) Search terms are ranked according to a complex formula, and frequent queries are cached (which is not a bad idea for any site.) I know Adam said this in the article, but I just wanted to restate it for those of you who appear to be confused in the postings. |
Previous Message | Next Message |


