comment on

I was surprised.

I have nothing to say for loading time. I never imagined execute and fetch become so fast. It is slower than hash lookup, but it becomes really fast.

And smaller memory usage. These are output of "ps -axorss,vsz -p $$" before loading, after loaded. Size is in KB.

>perl 07-1.pl  #load to hash
before:
  RSS   VSZ
 2736  5036
after:
  RSS   VSZ
61332 63404

>perl 07-2.pl  #sqlite in-memory
before:
  RSS   VSZ
 4612  7308
after:
  RSS   VSZ
 7796  9992

File size of database is 37MB, if I dump this table to text, it becomes 8.9MB. I wonder how they load it in-memory?

Below is some test results of lookup. It loads data and lookup ARG times from 283600 records/hash.

>perl 07.pl 100
            (warning: too few iterations for a reliable count)
               s/iter 03_sqlite_disk      01_substr  02_sqlite_mem
03_sqlite_disk   24.0             --           -89%           -97%
01_substr        2.72           780%             --           -77%
02_sqlite_mem   0.627          3720%           334%             --

I cut SQLite on disk here after.

>perl 07.pl 1000
              s/iter     01_substr 02_sqlite_mem
01_substr       2.71            --          -75%
02_sqlite_mem  0.687          295%            --

>perl 07.pl 10000
              s/iter     01_substr 02_sqlite_mem
01_substr       2.74            --          -61%
02_sqlite_mem   1.07          157%            --

>perl 07.pl 50000
              s/iter     01_substr 02_sqlite_mem
01_substr       2.80            --           -3%
02_sqlite_mem   2.72            3%            --

>perl 07.pl 100000
              s/iter 02_sqlite_mem     01_substr
02_sqlite_mem   4.59            --          -36%
01_substr       2.92           57%            --

And test code.

#!/usr/bin/perl
#SQLite mem test
use strict; use warnings;
use Time::HiRes;
use Benchmark qw/cmpthese/;
use DBI;

my $href;
my @t_synsets;
my $DB_DISK='./wnjpn.db';
my $DB_TO_MEM='./wn-synlink.db';

#load @ARGV[0] of synset for lookup test
load_test_synsets();

my %tests = (
    '01_substr' => \&test1,
    '02_sqlite_mem' => \&test2,
#   '03_sqlite_disk' => \&test3,
);
cmpthese(
    -20, #for 20 cpu secs
    \%tests
);
sub test1{ #substr
    $href={};
    open(my $fh, "<", "04.txt") or die $!;
    while(<$fh>){
        chomp;
        push @{ $href->{ substr($_,0,10)} }, [ substr($_,10,10), subst
+r($_,20)];
    }
    close $fh;

    #iterate
    my $cnt=0;
    for( @t_synsets ){
        if (exists $href->{$_} ){
            $cnt += @{ $href->{$_} };
        }
    }
    #print "test1:arrays cnt=$cnt\n";
}
sub test2{ #sqlite mem
    my $mem_dbh = DBI->connect('dbi:SQLite:dbname=:memory:');
    $mem_dbh->sqlite_backup_from_file($DB_TO_MEM);
    my $sth=$mem_dbh->prepare("select synset2 from synlink where synse
+t1=?");

    #iterate
    my $cnt=0;
    for( @t_synsets ){
        $sth->execute($_);
        my $tmp=$sth->fetchall_arrayref;
        $cnt+=@{$tmp};
    }
    $sth->finish;
    #print "test2:arrays cnt=$cnt\n";

    $mem_dbh->disconnect;
}
sub test3{ #sqlite disk
    my $dbh = DBI->connect("dbi:SQLite:dbname=$DB_DISK") or die DBI->e
+rrstr;
    my $sth=$dbh->prepare("select synset2 from synlink where synset1=?
+");

    #iterate
    my $cnt=0;
    for( @t_synsets ){
        $sth->execute($_);
        my $tmp=$sth->fetchall_arrayref;
        $cnt+=@{$tmp};
    }
    $sth->finish;
    #print "test3:arrays cnt=$cnt\n";

    $dbh->disconnect;
}
#$ARGV[0] synsets to lookup
sub load_test_synsets{
    my $dbh=DBI->connect("dbi:SQLite:./wnjpn.db","","", {
        AutoCommit=>0,
        #sqlite_use_immediate_transaction=>1,
        #sqlite_unicode=>1,
        RaiseError=>1,
    }) or die DBI->errstr;

    if ( defined($ARGV[0]) && $ARGV[0] =~ /^\d+$/){
        @t_synsets = map{$_->[0]}
            @{$dbh->selectall_arrayref("select synset FROM synset limi
+t $ARGV[0]")};
    } else {
        @t_synsets = map{$_->[0]}
            @{$dbh->selectall_arrayref("select synset FROM synset")};
    }
    $dbh->disconnect;
}
[download]

I would like to lookup around 2000 to 15000 records, so SQLite in-memory suit me fine. Thanks for information.

In reply to Re^2: Loading 283600 records (WordNet) by remiah
in thread Loading 283600 records (WordNet) by remiah

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


laziness, impatience, and hubris
	PerlMonks