I was surprised.
I have nothing to say for loading time. I never imagined execute and fetch become so fast. It is slower than hash lookup, but it becomes really fast.
And smaller memory usage. These are output of "ps -axorss,vsz -p $$" before loading, after loaded. Size is in KB.
>perl 07-1.pl #load to hash
before:
RSS VSZ
2736 5036
after:
RSS VSZ
61332 63404
>perl 07-2.pl #sqlite in-memory
before:
RSS VSZ
4612 7308
after:
RSS VSZ
7796 9992
File size of database is 37MB, if I dump this table to text, it becomes 8.9MB. I wonder how they load it in-memory?
Below is some test results of lookup. It loads data and lookup ARG times from 283600 records/hash.
>perl 07.pl 100
(warning: too few iterations for a reliable count)
s/iter 03_sqlite_disk 01_substr 02_sqlite_mem
03_sqlite_disk 24.0 -- -89% -97%
01_substr 2.72 780% -- -77%
02_sqlite_mem 0.627 3720% 334% --
I cut SQLite on disk here after.
>perl 07.pl 1000
s/iter 01_substr 02_sqlite_mem
01_substr 2.71 -- -75%
02_sqlite_mem 0.687 295% --
>perl 07.pl 10000
s/iter 01_substr 02_sqlite_mem
01_substr 2.74 -- -61%
02_sqlite_mem 1.07 157% --
>perl 07.pl 50000
s/iter 01_substr 02_sqlite_mem
01_substr 2.80 -- -3%
02_sqlite_mem 2.72 3% --
>perl 07.pl 100000
s/iter 02_sqlite_mem 01_substr
02_sqlite_mem 4.59 -- -36%
01_substr 2.92 57% --
And test code.
#!/usr/bin/perl
#SQLite mem test
use strict; use warnings;
use Time::HiRes;
use Benchmark qw/cmpthese/;
use DBI;
my $href;
my @t_synsets;
my $DB_DISK='./wnjpn.db';
my $DB_TO_MEM='./wn-synlink.db';
#load @ARGV[0] of synset for lookup test
load_test_synsets();
my %tests = (
'01_substr' => \&test1,
'02_sqlite_mem' => \&test2,
# '03_sqlite_disk' => \&test3,
);
cmpthese(
-20, #for 20 cpu secs
\%tests
);
sub test1{ #substr
$href={};
open(my $fh, "<", "04.txt") or die $!;
while(<$fh>){
chomp;
push @{ $href->{ substr($_,0,10)} }, [ substr($_,10,10), subst
+r($_,20)];
}
close $fh;
#iterate
my $cnt=0;
for( @t_synsets ){
if (exists $href->{$_} ){
$cnt += @{ $href->{$_} };
}
}
#print "test1:arrays cnt=$cnt\n";
}
sub test2{ #sqlite mem
my $mem_dbh = DBI->connect('dbi:SQLite:dbname=:memory:');
$mem_dbh->sqlite_backup_from_file($DB_TO_MEM);
my $sth=$mem_dbh->prepare("select synset2 from synlink where synse
+t1=?");
#iterate
my $cnt=0;
for( @t_synsets ){
$sth->execute($_);
my $tmp=$sth->fetchall_arrayref;
$cnt+=@{$tmp};
}
$sth->finish;
#print "test2:arrays cnt=$cnt\n";
$mem_dbh->disconnect;
}
sub test3{ #sqlite disk
my $dbh = DBI->connect("dbi:SQLite:dbname=$DB_DISK") or die DBI->e
+rrstr;
my $sth=$dbh->prepare("select synset2 from synlink where synset1=?
+");
#iterate
my $cnt=0;
for( @t_synsets ){
$sth->execute($_);
my $tmp=$sth->fetchall_arrayref;
$cnt+=@{$tmp};
}
$sth->finish;
#print "test3:arrays cnt=$cnt\n";
$dbh->disconnect;
}
#$ARGV[0] synsets to lookup
sub load_test_synsets{
my $dbh=DBI->connect("dbi:SQLite:./wnjpn.db","","", {
AutoCommit=>0,
#sqlite_use_immediate_transaction=>1,
#sqlite_unicode=>1,
RaiseError=>1,
}) or die DBI->errstr;
if ( defined($ARGV[0]) && $ARGV[0] =~ /^\d+$/){
@t_synsets = map{$_->[0]}
@{$dbh->selectall_arrayref("select synset FROM synset limi
+t $ARGV[0]")};
} else {
@t_synsets = map{$_->[0]}
@{$dbh->selectall_arrayref("select synset FROM synset")};
}
$dbh->disconnect;
}
I would like to lookup around 2000 to 15000 records, so SQLite in-memory suit me fine. Thanks for information.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.