Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

DBD::SQLite questions

by elef (Friar)
on Nov 27, 2013 at 15:27 UTC ( [id://1064637]=perlquestion: print w/replies, xml ) Need Help??

elef has asked for the wisdom of the Perl Monks concerning the following question:

I'm building a text search app using DBD::SQLite and FTS4. I'm completely new to databases, but I got the basic things working well. Now there are a couple of things I need advice on.

(1) I would like to compress the databases. A lot of people will be using dbs that contain 10 million records or more (upwards of 10GB or even 50GB uncompressed). This is all UTF-8 text data, some of it very repetitive so it compresses well. How do I go about this? The DBD::SQLite cpan page mentions compression briefly, and there is a bit on this at http://www.sqlite.org/fts3.html#section_6_1 but I can find no actual working sample code that I could use. The sqlite.org page seems to assume that I will write the compress and uncompress functions myself, which is way above my pay grade. Is there a ready-made solution somewhere that I am missing?

(2) Currently, my app can handle one database file at a time. It does a search and returns the results ordered by length. I would like to add support for multiple database files, allowing the user to set a ranking and displaying hits in that order (first, all hits from 'tier 1' dbs ordered by length, then all hits from 'tier 2' dbs ordered by length etc.). Is this feasible? Would I need to ATTACH each of the dbs to the same connection? How can I run the same query on all the dbs? Create a foreach my $db (@databases) loop and iterate through them, executing the query in each one in turn?

(3) I would like to add match highlighting but the DBD::SQLite page says "The current FTS implementation in SQLite is far from complete with respect to utf8 handling : in particular, variable-length characters are not treated correctly by the builtin functions offsets() and snippet()." Has that been fixed since this was written or should I forget about offsets() and try to write my own code that tries to analyze each string returned from the database and find the search terms in it so I can highlight them?

I can post the my current (working) code if anyone's interested.

Replies are listed 'Best First'.
Re: DBD::SQLite questions
by ww (Archbishop) on Nov 27, 2013 at 17:31 UTC
    Too many questions with too many unrelated answers for a single thread.

      One thing a SQLite is a flat files based DB so you can compress the files using either gzip or other compression you like.

        Well, SQLite has internal hooks for compress() and uncompress() so I would like to use those. I just don't know how.
        From the documentation, it seems I would need to call
        $dbh->sqlite_create_function( 'compress', 1, sub { # magic incantation +s for compression } ); $dbh->sqlite_create_function( 'uncompress', 1, sub { # magic incantati +ons for uncompression } );


        Then when I call INSERT or SELECT I need to compress() and uncompress() the text strings.
        Maybe I could use IO::Compress::Zip for the magic incantations...?
        That's what I can gather, but I can find no specifics on implementation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1064637]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2025-01-17 08:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which URL do you most often use to access this site?












    Results (55 votes). Check out past polls.