Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

How do I read from a compressed SQLite FTS4 database with DBD::SQLite?

by elef (Friar)
on Nov 29, 2013 at 14:43 UTC ( #1064966=perlquestion: print w/ replies, xml ) Need Help??
elef has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,
I would like to compress SQLite FTS databases (utf-8 text data) to reduce file sizes and perhaps even improve performance. Based on documentation here and here and a lot of guesswork, I came up with some partially working code. It creates the db, and it seems to compress it as the file is smaller than the uncompressed db with the same entries. When I run a SELECT query on the compressed db, I get a correct report about the number of hits but the text content itself doesn't show up. What am I doing wrong?
use strict; use warnings; use utf8; use DBI; use IO::Compress::Gzip qw(gzip $GzipError) ; use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ; sub compressor; sub uncompressor; print "\nTesting 'compressor' and 'uncompressor' subs: " . uncompresso +r(compressor("test passed")) . "\n"; my $dbh = DBI->connect( "dbi:SQLite:dbname=test.db", "", "", { # RaiseError => 1, # we'll hand +le the errors ourselves sqlite_unicode => 1, } ) or die "Can't connect to database: $DBI::err +str";; # database connection var $dbh->sqlite_create_function( 'compressme', 1, sub { compressor } ); $dbh->sqlite_create_function( 'uncompressme', 1, sub { uncompressor } +); $dbh->do("DROP TABLE IF EXISTS EnHu") or print "drop table failed"; # $dbh->do("CREATE VIRTUAL TABLE EnHu USING fts4 (English TEXT, Hungar +ian TEXT)") or print "failed to create fts4 table"; # uncompre +ssed $dbh->do("CREATE VIRTUAL TABLE EnHu USING fts4 (English TEXT, Hungaria +n TEXT, compress=compressme, uncompress=uncompressme)") or print "fai +led to create fts4 table"; # compressed while (<DATA>) { chomp; /^([^\t]*)\t([^\t]*)/; my $lang1 = $1; my $lang2 = $2; # print "\n\ninserting:\n$lang1\n$lang2"; my $insert = $dbh->prepare('INSERT INTO EnHu (English, Hungarian) +VALUES(?, ?)'); # my $insert = $dbh->prepare('INSERT INTO EnHu (English, Hungarian +) VALUES(compressme(?), compressme(?))'); #no need to call the fun +ctions, SQLite seems to do it on its own $insert->execute("$lang1", "$lang2"); } my $countentries = $dbh->prepare( "SELECT Count(*) FROM EnHu" ); # OK $countentries->execute(); my $size = $countentries->fetch(); my $searchterm = "some"; print "\n\nImport done, the db contains @$size records.\nEnter search +term to run query\n"; chomp($searchterm = <STDIN>); my $q_1 = $dbh->prepare( "SELECT * FROM EnHu WHERE English MATCH ?" ) +or print "prepare failed"; # works on uncompressed db, kinda works + on compressed db # my $q_1 = $dbh->prepare( "SELECT uncompressme(English) FROM EnHu WHE +RE English MATCH ?" ) or print "prepare failed"; # doesn't work # my $q_1 = $dbh->prepare( "SELECT uncompressme(English),uncompressme( +Hungarian) FROM EnHu WHERE English MATCH ?" ) or print "prepare faile +d"; # my $q_1 = $dbh->prepare( "SELECT uncompressme(English) FROM EnHu WHE +RE English MATCH ?" ) or print "prepare failed"; # my $q_1 = $dbh->prepare( "SELECT * FROM EnHu WHERE uncompressme(Engl +ish) MATCH ?" ); # doesn't work # my $q_1 = $dbh->prepare( "SELECT * FROM EnHu WHERE English MATCH com +pressme(?)" ); # nope $q_1->execute($searchterm) or print "execute failed"; # my $q_2 = $dbh->prepare( "SELECT * FROM EnHu LIMIT 4" ); # $q_2->execute() or print "execute failed"; # my $all = $q_2->fetchall_arrayref() or print "fetch failed"; my $all = $q_1->fetchall_arrayref() or print "fetch failed"; my $i = @$all; foreach my $row (@$all) { print "\n--------------------\n"; foreach my $item (@$row) { print $item; print "\t"; }; } print "\n\n"; print "\n$i matches found."; <STDIN>; sub compressor { my $in = shift; my $out; gzip \$in => \$out; return ($out); } sub uncompressor { my $in = shift; my $out; gunzip \$in => \$out; return ($out); } __DATA__ this is some text némi szöveg some more text here na még egy kicsi just for testing you know nalássuk fourth row of text még némi szöveg and a little more text for good measure dobjuk hozzá még ezt is

Comment on How do I read from a compressed SQLite FTS4 database with DBD::SQLite?
Select or Download Code
Re: How do I read from a compressed SQLite FTS4 database with DBD::SQLite?
by Corion (Pope) on Nov 29, 2013 at 14:47 UTC

    When you pass your de/compression callbacks to SQLite, did you mean to pass references?

    ... $dbh->sqlite_create_function( 'compressme', 1, sub { compressor } ); $dbh->sqlite_create_function( 'uncompressme', 1, sub { uncompressor } +); ...

    As written, the functions will never pass any arguments through to compressor resp. uncompressor.

    Did you mean the following?

    $dbh->sqlite_create_function( 'compressme', 1, \&compressor ); $dbh->sqlite_create_function( 'uncompressme', 1, \&uncompressor );
      Thank you, that fixes it. It runs correctly with ASCII text now.
      I did get another problem though: a bunch of SQL logic errors on INSERT, which seem to be caused by the fact that IO::Compress::Gzip can't handle non-ASCII text, at least not in the way I've been using it.
      So, is there some flag I can set to make it handle UTF-8 text? Or should I use some other compression method? All the compression-related perl modules I found seemed to be designed to work with files; IO::Compress::Gzip was the only one that I could find that offers a simple way to compress strings.
      Also, my test db created with compression is much larger and slower than the same test db without compression, which is not exactly what I was hoping for...

        I would assume that you should explicitly encode your text to UTF-8 on compressing and explicitly decode from UTF-8 on decompressing. IO::Compress::Gzip likely only works on octets and expects octets. I wonder why it doesn't scream bloody murder...

        Also, maybe you need to explicitly set sqlite_unicode if you are reading/storing UTF-8 data in SQLite.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1064966]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2014-07-26 17:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (178 votes), past polls