|Perl: the Markov chain saw|
Using DBIx::FullTextSearchby trs80 (Priest)
|on Sep 21, 2002 at 00:08 UTC||Need Help??|
I had to upgrade the search capability of a web site this week and it needed to be done quickly. I started by looking at what the current code did vs. what the site really needed it to do. The code (inherited) was horrible. Here is the "search":
This nice code would match 'across' if you looked for 'Ross', I doubt if that is what they wanted. I felt there had to be something already available to do this on CPAN and so began my search.
It took me some searching at CPAN but I finally stumbled across the DBIx::FullTextSearch module, but it was late at night and the doucmentation didn't seem to make much sense so I called it a day.
Well that sleep did me good, I was finally able to make some sense out of the documentation and make a plan on how to convert what I had over.
The documentation still doesn't give a clear (at least to my mind) path to implementing a solution, that is understandable since it isn't exactly an "everyday" item. This is my attempt at explaining the steps I took to shoehorn it into an existing application without too much pain. The documentation talks about a 'frontend' and a 'backend', the frontend is the information to be indexed. In this example we are going to use a database for both the backend and the frontend. See the docs for more options.
Phase One - Tests
One thing I have learned the hard way is that you should test your code outside of the web environment before you add an additional layer of complexity on to it, aka "mod_perl/<your framework/template here>". With that in mind I made two scripts, full_text_search_create.pl and full_text_search.pl
The first and most important script is the full_text_search_create.pl script since it actually creates the indexes that are used by the second one.
In my case I was lucky when it came to the tables because they were already properly setup with Primary keys so it removed a step from the process. If you attempting to work on a table that doesn't have a primary key, I suggest you do that before you attempt using this module.
Here is our first script
In my table structures each table has the same id name so I was able to avoid having to add that into the hash, but it wouldn't be so bad to include it anyway just allow for future growth.
The above code will create a new index or erase an existing one and recreate it. This makes it simple to run anytime there is a major update or in a nightly cron job.
The new names are created inside of the database you connected to in your DBI connect. I named them DBIx_<table_name> so I could spot them easily and it is also unlikely to conflict with existing table names.
For every index you want to create there are three tables: <front_end_table_name> <front_end_table_name>_data <front_end_table_name>_words These are all created automatically by the create statement and no action is required on your end.
Once the indexes are created we need to be able to test our create from the command line to see if it is returning the correct results.
The second program expects a minimum of 3 command line arguments, something along the lines of:
You could also do:
and it will search for all three. It gives fairly ugly/simple results, but it just an index/search test interface that isn't intended for end users.
Beyond this things become more specific to your own site and how you want to render the results, but hopefully this revealed some of the magic in this module.
Update(1): Made some minor verbiage adjustments and added code tags on commandline for running the program