http://www.perlmonks.org?node_id=995344


in reply to Re: Loading 283600 records (WordNet)
in thread Loading 283600 records (WordNet)

I was surprised.

I have nothing to say for loading time. I never imagined execute and fetch become so fast. It is slower than hash lookup, but it becomes really fast.

And smaller memory usage. These are output of "ps -axorss,vsz -p $$" before loading, after loaded. Size is in KB.

>perl 07-1.pl  #load to hash
before:
  RSS   VSZ
 2736  5036
after:
  RSS   VSZ
61332 63404

>perl 07-2.pl  #sqlite in-memory
before:
  RSS   VSZ
 4612  7308
after:
  RSS   VSZ
 7796  9992
File size of database is 37MB, if I dump this table to text, it becomes 8.9MB. I wonder how they load it in-memory?

Below is some test results of lookup. It loads data and lookup ARG times from 283600 records/hash.
>perl 07.pl 100
            (warning: too few iterations for a reliable count)
               s/iter 03_sqlite_disk      01_substr  02_sqlite_mem
03_sqlite_disk   24.0             --           -89%           -97%
01_substr        2.72           780%             --           -77%
02_sqlite_mem   0.627          3720%           334%             --

I cut SQLite on disk here after.
>perl 07.pl 1000
              s/iter     01_substr 02_sqlite_mem
01_substr       2.71            --          -75%
02_sqlite_mem  0.687          295%            --

>perl 07.pl 10000
              s/iter     01_substr 02_sqlite_mem
01_substr       2.74            --          -61%
02_sqlite_mem   1.07          157%            --

>perl 07.pl 50000
              s/iter     01_substr 02_sqlite_mem
01_substr       2.80            --           -3%
02_sqlite_mem   2.72            3%            --

>perl 07.pl 100000
              s/iter 02_sqlite_mem     01_substr
02_sqlite_mem   4.59            --          -36%
01_substr       2.92           57%            --

And test code.
#!/usr/bin/perl #SQLite mem test use strict; use warnings; use Time::HiRes; use Benchmark qw/cmpthese/; use DBI; my $href; my @t_synsets; my $DB_DISK='./wnjpn.db'; my $DB_TO_MEM='./wn-synlink.db'; #load @ARGV[0] of synset for lookup test load_test_synsets(); my %tests = ( '01_substr' => \&test1, '02_sqlite_mem' => \&test2, # '03_sqlite_disk' => \&test3, ); cmpthese( -20, #for 20 cpu secs \%tests ); sub test1{ #substr $href={}; open(my $fh, "<", "04.txt") or die $!; while(<$fh>){ chomp; push @{ $href->{ substr($_,0,10)} }, [ substr($_,10,10), subst +r($_,20)]; } close $fh; #iterate my $cnt=0; for( @t_synsets ){ if (exists $href->{$_} ){ $cnt += @{ $href->{$_} }; } } #print "test1:arrays cnt=$cnt\n"; } sub test2{ #sqlite mem my $mem_dbh = DBI->connect('dbi:SQLite:dbname=:memory:'); $mem_dbh->sqlite_backup_from_file($DB_TO_MEM); my $sth=$mem_dbh->prepare("select synset2 from synlink where synse +t1=?"); #iterate my $cnt=0; for( @t_synsets ){ $sth->execute($_); my $tmp=$sth->fetchall_arrayref; $cnt+=@{$tmp}; } $sth->finish; #print "test2:arrays cnt=$cnt\n"; $mem_dbh->disconnect; } sub test3{ #sqlite disk my $dbh = DBI->connect("dbi:SQLite:dbname=$DB_DISK") or die DBI->e +rrstr; my $sth=$dbh->prepare("select synset2 from synlink where synset1=? +"); #iterate my $cnt=0; for( @t_synsets ){ $sth->execute($_); my $tmp=$sth->fetchall_arrayref; $cnt+=@{$tmp}; } $sth->finish; #print "test3:arrays cnt=$cnt\n"; $dbh->disconnect; } #$ARGV[0] synsets to lookup sub load_test_synsets{ my $dbh=DBI->connect("dbi:SQLite:./wnjpn.db","","", { AutoCommit=>0, #sqlite_use_immediate_transaction=>1, #sqlite_unicode=>1, RaiseError=>1, }) or die DBI->errstr; if ( defined($ARGV[0]) && $ARGV[0] =~ /^\d+$/){ @t_synsets = map{$_->[0]} @{$dbh->selectall_arrayref("select synset FROM synset limi +t $ARGV[0]")}; } else { @t_synsets = map{$_->[0]} @{$dbh->selectall_arrayref("select synset FROM synset")}; } $dbh->disconnect; }
I would like to lookup around 2000 to 15000 records, so SQLite in-memory suit me fine. Thanks for information.

Replies are listed 'Best First'.
Re^3: Loading 283600 records (WordNet)
by BrowserUk (Patriarch) on Sep 24, 2012 at 10:45 UTC

    Looks like you have your solution. ;)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong