Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Loading 283600 records (WordNet)

by remiah (Hermit)
on Sep 24, 2012 at 10:25 UTC ( #995344=note: print w/ replies, xml ) Need Help??


in reply to Re: Loading 283600 records (WordNet)
in thread Loading 283600 records (WordNet)

I was surprised.

I have nothing to say for loading time. I never imagined execute and fetch become so fast. It is slower than hash lookup, but it becomes really fast.

And smaller memory usage. These are output of "ps -axorss,vsz -p $$" before loading, after loaded. Size is in KB.

>perl 07-1.pl  #load to hash
before:
  RSS   VSZ
 2736  5036
after:
  RSS   VSZ
61332 63404

>perl 07-2.pl  #sqlite in-memory
before:
  RSS   VSZ
 4612  7308
after:
  RSS   VSZ
 7796  9992
File size of database is 37MB, if I dump this table to text, it becomes 8.9MB. I wonder how they load it in-memory?

Below is some test results of lookup. It loads data and lookup ARG times from 283600 records/hash.
>perl 07.pl 100
            (warning: too few iterations for a reliable count)
               s/iter 03_sqlite_disk      01_substr  02_sqlite_mem
03_sqlite_disk   24.0             --           -89%           -97%
01_substr        2.72           780%             --           -77%
02_sqlite_mem   0.627          3720%           334%             --

I cut SQLite on disk here after.
>perl 07.pl 1000
              s/iter     01_substr 02_sqlite_mem
01_substr       2.71            --          -75%
02_sqlite_mem  0.687          295%            --

>perl 07.pl 10000
              s/iter     01_substr 02_sqlite_mem
01_substr       2.74            --          -61%
02_sqlite_mem   1.07          157%            --

>perl 07.pl 50000
              s/iter     01_substr 02_sqlite_mem
01_substr       2.80            --           -3%
02_sqlite_mem   2.72            3%            --

>perl 07.pl 100000
              s/iter 02_sqlite_mem     01_substr
02_sqlite_mem   4.59            --          -36%
01_substr       2.92           57%            --

And test code.
#!/usr/bin/perl #SQLite mem test use strict; use warnings; use Time::HiRes; use Benchmark qw/cmpthese/; use DBI; my $href; my @t_synsets; my $DB_DISK='./wnjpn.db'; my $DB_TO_MEM='./wn-synlink.db'; #load @ARGV[0] of synset for lookup test load_test_synsets(); my %tests = ( '01_substr' => \&test1, '02_sqlite_mem' => \&test2, # '03_sqlite_disk' => \&test3, ); cmpthese( -20, #for 20 cpu secs \%tests ); sub test1{ #substr $href={}; open(my $fh, "<", "04.txt") or die $!; while(<$fh>){ chomp; push @{ $href->{ substr($_,0,10)} }, [ substr($_,10,10), subst +r($_,20)]; } close $fh; #iterate my $cnt=0; for( @t_synsets ){ if (exists $href->{$_} ){ $cnt += @{ $href->{$_} }; } } #print "test1:arrays cnt=$cnt\n"; } sub test2{ #sqlite mem my $mem_dbh = DBI->connect('dbi:SQLite:dbname=:memory:'); $mem_dbh->sqlite_backup_from_file($DB_TO_MEM); my $sth=$mem_dbh->prepare("select synset2 from synlink where synse +t1=?"); #iterate my $cnt=0; for( @t_synsets ){ $sth->execute($_); my $tmp=$sth->fetchall_arrayref; $cnt+=@{$tmp}; } $sth->finish; #print "test2:arrays cnt=$cnt\n"; $mem_dbh->disconnect; } sub test3{ #sqlite disk my $dbh = DBI->connect("dbi:SQLite:dbname=$DB_DISK") or die DBI->e +rrstr; my $sth=$dbh->prepare("select synset2 from synlink where synset1=? +"); #iterate my $cnt=0; for( @t_synsets ){ $sth->execute($_); my $tmp=$sth->fetchall_arrayref; $cnt+=@{$tmp}; } $sth->finish; #print "test3:arrays cnt=$cnt\n"; $dbh->disconnect; } #$ARGV[0] synsets to lookup sub load_test_synsets{ my $dbh=DBI->connect("dbi:SQLite:./wnjpn.db","","", { AutoCommit=>0, #sqlite_use_immediate_transaction=>1, #sqlite_unicode=>1, RaiseError=>1, }) or die DBI->errstr; if ( defined($ARGV[0]) && $ARGV[0] =~ /^\d+$/){ @t_synsets = map{$_->[0]} @{$dbh->selectall_arrayref("select synset FROM synset limi +t $ARGV[0]")}; } else { @t_synsets = map{$_->[0]} @{$dbh->selectall_arrayref("select synset FROM synset")}; } $dbh->disconnect; }
I would like to lookup around 2000 to 15000 records, so SQLite in-memory suit me fine. Thanks for information.


Comment on Re^2: Loading 283600 records (WordNet)
Download Code
Re^3: Loading 283600 records (WordNet)
by BrowserUk (Pope) on Sep 24, 2012 at 10:45 UTC

    Looks like you have your solution. ;)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://995344]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (12)
As of 2014-08-29 22:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (289 votes), past polls