http://www.perlmonks.org?node_id=1064637

elef has asked for the wisdom of the Perl Monks concerning the following question:

I'm building a text search app using DBD::SQLite and FTS4. I'm completely new to databases, but I got the basic things working well. Now there are a couple of things I need advice on.

(1) I would like to compress the databases. A lot of people will be using dbs that contain 10 million records or more (upwards of 10GB or even 50GB uncompressed). This is all UTF-8 text data, some of it very repetitive so it compresses well. How do I go about this? The DBD::SQLite cpan page mentions compression briefly, and there is a bit on this at http://www.sqlite.org/fts3.html#section_6_1 but I can find no actual working sample code that I could use. The sqlite.org page seems to assume that I will write the compress and uncompress functions myself, which is way above my pay grade. Is there a ready-made solution somewhere that I am missing?

(2) Currently, my app can handle one database file at a time. It does a search and returns the results ordered by length. I would like to add support for multiple database files, allowing the user to set a ranking and displaying hits in that order (first, all hits from 'tier 1' dbs ordered by length, then all hits from 'tier 2' dbs ordered by length etc.). Is this feasible? Would I need to ATTACH each of the dbs to the same connection? How can I run the same query on all the dbs? Create a foreach my $db (@databases) loop and iterate through them, executing the query in each one in turn?

(3) I would like to add match highlighting but the DBD::SQLite page says "The current FTS implementation in SQLite is far from complete with respect to utf8 handling : in particular, variable-length characters are not treated correctly by the builtin functions offsets() and snippet()." Has that been fixed since this was written or should I forget about offsets() and try to write my own code that tries to analyze each string returned from the database and find the search terms in it so I can highlight them?

I can post the my current (working) code if anyone's interested.