in reply to Re: Loading 283600 records (WordNet)
in thread Loading 283600 records (WordNet)
I was surprised.
I have nothing to say for loading time. I never imagined execute and fetch become so fast. It is slower than hash lookup, but it becomes really fast.
And smaller memory usage. These are output of "ps -axorss,vsz -p $$" before loading, after loaded. Size is in KB.
>perl 07-1.pl #load to hash before: RSS VSZ 2736 5036 after: RSS VSZ 61332 63404 >perl 07-2.pl #sqlite in-memory before: RSS VSZ 4612 7308 after: RSS VSZ 7796 9992File size of database is 37MB, if I dump this table to text, it becomes 8.9MB. I wonder how they load it in-memory?
Below is some test results of lookup. It loads data and lookup ARG times from 283600 records/hash.
>perl 07.pl 100 (warning: too few iterations for a reliable count) s/iter 03_sqlite_disk 01_substr 02_sqlite_mem 03_sqlite_disk 24.0 -- -89% -97% 01_substr 2.72 780% -- -77% 02_sqlite_mem 0.627 3720% 334% --
I cut SQLite on disk here after.
>perl 07.pl 1000 s/iter 01_substr 02_sqlite_mem 01_substr 2.71 -- -75% 02_sqlite_mem 0.687 295% -- >perl 07.pl 10000 s/iter 01_substr 02_sqlite_mem 01_substr 2.74 -- -61% 02_sqlite_mem 1.07 157% -- >perl 07.pl 50000 s/iter 01_substr 02_sqlite_mem 01_substr 2.80 -- -3% 02_sqlite_mem 2.72 3% -- >perl 07.pl 100000 s/iter 02_sqlite_mem 01_substr 02_sqlite_mem 4.59 -- -36% 01_substr 2.92 57% --And test code.
#!/usr/bin/perl #SQLite mem test use strict; use warnings; use Time::HiRes; use Benchmark qw/cmpthese/; use DBI; my $href; my @t_synsets; my $DB_DISK='./wnjpn.db'; my $DB_TO_MEM='./wn-synlink.db'; #load @ARGV[0] of synset for lookup test load_test_synsets(); my %tests = ( '01_substr' => \&test1, '02_sqlite_mem' => \&test2, # '03_sqlite_disk' => \&test3, ); cmpthese( -20, #for 20 cpu secs \%tests ); sub test1{ #substr $href={}; open(my $fh, "<", "04.txt") or die $!; while(<$fh>){ chomp; push @{ $href->{ substr($_,0,10)} }, [ substr($_,10,10), subst +r($_,20)]; } close $fh; #iterate my $cnt=0; for( @t_synsets ){ if (exists $href->{$_} ){ $cnt += @{ $href->{$_} }; } } #print "test1:arrays cnt=$cnt\n"; } sub test2{ #sqlite mem my $mem_dbh = DBI->connect('dbi:SQLite:dbname=:memory:'); $mem_dbh->sqlite_backup_from_file($DB_TO_MEM); my $sth=$mem_dbh->prepare("select synset2 from synlink where synse +t1=?"); #iterate my $cnt=0; for( @t_synsets ){ $sth->execute($_); my $tmp=$sth->fetchall_arrayref; $cnt+=@{$tmp}; } $sth->finish; #print "test2:arrays cnt=$cnt\n"; $mem_dbh->disconnect; } sub test3{ #sqlite disk my $dbh = DBI->connect("dbi:SQLite:dbname=$DB_DISK") or die DBI->e +rrstr; my $sth=$dbh->prepare("select synset2 from synlink where synset1=? +"); #iterate my $cnt=0; for( @t_synsets ){ $sth->execute($_); my $tmp=$sth->fetchall_arrayref; $cnt+=@{$tmp}; } $sth->finish; #print "test3:arrays cnt=$cnt\n"; $dbh->disconnect; } #$ARGV[0] synsets to lookup sub load_test_synsets{ my $dbh=DBI->connect("dbi:SQLite:./wnjpn.db","","", { AutoCommit=>0, #sqlite_use_immediate_transaction=>1, #sqlite_unicode=>1, RaiseError=>1, }) or die DBI->errstr; if ( defined($ARGV[0]) && $ARGV[0] =~ /^\d+$/){ @t_synsets = map{$_->[0]} @{$dbh->selectall_arrayref("select synset FROM synset limi +t $ARGV[0]")}; } else { @t_synsets = map{$_->[0]} @{$dbh->selectall_arrayref("select synset FROM synset")}; } $dbh->disconnect; }
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^3: Loading 283600 records (WordNet)
by BrowserUk (Patriarch) on Sep 24, 2012 at 10:45 UTC |
In Section
Seekers of Perl Wisdom