Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re^2: Loading 283600 records (WordNet)

by remiah (Hermit)
on Sep 24, 2012 at 10:25 UTC ( #995344=note: print w/replies, xml ) Need Help??

in reply to Re: Loading 283600 records (WordNet)
in thread Loading 283600 records (WordNet)

I was surprised.

I have nothing to say for loading time. I never imagined execute and fetch become so fast. It is slower than hash lookup, but it becomes really fast.

And smaller memory usage. These are output of "ps -axorss,vsz -p $$" before loading, after loaded. Size is in KB.

>perl  #load to hash
 2736  5036
61332 63404

>perl  #sqlite in-memory
 4612  7308
 7796  9992
File size of database is 37MB, if I dump this table to text, it becomes 8.9MB. I wonder how they load it in-memory?

Below is some test results of lookup. It loads data and lookup ARG times from 283600 records/hash.
>perl 100
            (warning: too few iterations for a reliable count)
               s/iter 03_sqlite_disk      01_substr  02_sqlite_mem
03_sqlite_disk   24.0             --           -89%           -97%
01_substr        2.72           780%             --           -77%
02_sqlite_mem   0.627          3720%           334%             --

I cut SQLite on disk here after.
>perl 1000
              s/iter     01_substr 02_sqlite_mem
01_substr       2.71            --          -75%
02_sqlite_mem  0.687          295%            --

>perl 10000
              s/iter     01_substr 02_sqlite_mem
01_substr       2.74            --          -61%
02_sqlite_mem   1.07          157%            --

>perl 50000
              s/iter     01_substr 02_sqlite_mem
01_substr       2.80            --           -3%
02_sqlite_mem   2.72            3%            --

>perl 100000
              s/iter 02_sqlite_mem     01_substr
02_sqlite_mem   4.59            --          -36%
01_substr       2.92           57%            --

And test code.
#!/usr/bin/perl #SQLite mem test use strict; use warnings; use Time::HiRes; use Benchmark qw/cmpthese/; use DBI; my $href; my @t_synsets; my $DB_DISK='./wnjpn.db'; my $DB_TO_MEM='./wn-synlink.db'; #load @ARGV[0] of synset for lookup test load_test_synsets(); my %tests = ( '01_substr' => \&test1, '02_sqlite_mem' => \&test2, # '03_sqlite_disk' => \&test3, ); cmpthese( -20, #for 20 cpu secs \%tests ); sub test1{ #substr $href={}; open(my $fh, "<", "04.txt") or die $!; while(<$fh>){ chomp; push @{ $href->{ substr($_,0,10)} }, [ substr($_,10,10), subst +r($_,20)]; } close $fh; #iterate my $cnt=0; for( @t_synsets ){ if (exists $href->{$_} ){ $cnt += @{ $href->{$_} }; } } #print "test1:arrays cnt=$cnt\n"; } sub test2{ #sqlite mem my $mem_dbh = DBI->connect('dbi:SQLite:dbname=:memory:'); $mem_dbh->sqlite_backup_from_file($DB_TO_MEM); my $sth=$mem_dbh->prepare("select synset2 from synlink where synse +t1=?"); #iterate my $cnt=0; for( @t_synsets ){ $sth->execute($_); my $tmp=$sth->fetchall_arrayref; $cnt+=@{$tmp}; } $sth->finish; #print "test2:arrays cnt=$cnt\n"; $mem_dbh->disconnect; } sub test3{ #sqlite disk my $dbh = DBI->connect("dbi:SQLite:dbname=$DB_DISK") or die DBI->e +rrstr; my $sth=$dbh->prepare("select synset2 from synlink where synset1=? +"); #iterate my $cnt=0; for( @t_synsets ){ $sth->execute($_); my $tmp=$sth->fetchall_arrayref; $cnt+=@{$tmp}; } $sth->finish; #print "test3:arrays cnt=$cnt\n"; $dbh->disconnect; } #$ARGV[0] synsets to lookup sub load_test_synsets{ my $dbh=DBI->connect("dbi:SQLite:./wnjpn.db","","", { AutoCommit=>0, #sqlite_use_immediate_transaction=>1, #sqlite_unicode=>1, RaiseError=>1, }) or die DBI->errstr; if ( defined($ARGV[0]) && $ARGV[0] =~ /^\d+$/){ @t_synsets = map{$_->[0]} @{$dbh->selectall_arrayref("select synset FROM synset limi +t $ARGV[0]")}; } else { @t_synsets = map{$_->[0]} @{$dbh->selectall_arrayref("select synset FROM synset")}; } $dbh->disconnect; }
I would like to lookup around 2000 to 15000 records, so SQLite in-memory suit me fine. Thanks for information.

Replies are listed 'Best First'.
Re^3: Loading 283600 records (WordNet)
by BrowserUk (Pope) on Sep 24, 2012 at 10:45 UTC

    Looks like you have your solution. ;)

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://995344]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2018-01-19 15:56 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (221 votes). Check out past polls.