Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Loading 283600 records (WordNet)

by dsheroh (Monsignor)
on Sep 23, 2012 at 09:48 UTC ( [id://995196]=note: print w/replies, xml ) Need Help??


in reply to Loading 283600 records (WordNet)

I'm not entirely clear on what you actually will be doing with the data after getting it loaded into memory, but there is another option: Copy the dataset from the on-disk SQLite database to an in-memory SQLite database using DBD::SQLite's sqlite_backup_from_file method.
my $mem_dbh = DBI->connect('dbi:SQLite:dbname=:memory:'); $mem_dbh->sqlite_backup_from_file($db_file_name); # process, process, process... # Don't forget to copy the in-memory data back to disk if # you've made any changes that should be persistent; if not, # skip this step. $mem_dbh->sqlite_backup_to_file($db_file_name);
It won't be as fast as using a hash for your in-memory manipulation, of course, but it will give you access to the full capabilities of SQLite if you find yourself needing to, e.g., query the data in ways that aren't easy/straightforward on a hash.

For comparison, I recently tried this technique out on a program that inserted a bunch of data into an SQLite database file, but was running inconveniently slowly. By copying the database into memory, inserting the data, and copying the database back to disk, the run time dropped from about seven and a half minutes to around one second. Quite a nice speedup, especially for such a trivial change to the source.

Replies are listed 'Best First'.
Re^2: Loading 283600 records (WordNet)
by remiah (Hermit) on Sep 23, 2012 at 11:26 UTC

    Thanks for reply, dsheroh.

    I have never dreamed of such solution. I am going to try this.

    I sometimes forget to describe what I am currently doing...
    I am drawing wordnet graph on HTML. Currently, it is tree of 8 depth, around 2-3000 to 15000 synsets. I am so nervous for loading time because it is CGI.

    Before I introduce SVG or GraphViz, I wanted to figure out 8 depth, 15000 synsets could be fetched, drawn smooth or not. I see my results, sadly, it is somewhat slow.

    Now I was thinking of PathENum table of Joe Celko.

    Thanks for your kind reply.
    regards

Re^2: Loading 283600 records (WordNet)
by remiah (Hermit) on Sep 24, 2012 at 10:25 UTC

    I was surprised.

    I have nothing to say for loading time. I never imagined execute and fetch become so fast. It is slower than hash lookup, but it becomes really fast.

    And smaller memory usage. These are output of "ps -axorss,vsz -p $$" before loading, after loaded. Size is in KB.

    >perl 07-1.pl  #load to hash
    before:
      RSS   VSZ
     2736  5036
    after:
      RSS   VSZ
    61332 63404
    
    >perl 07-2.pl  #sqlite in-memory
    before:
      RSS   VSZ
     4612  7308
    after:
      RSS   VSZ
     7796  9992
    
    File size of database is 37MB, if I dump this table to text, it becomes 8.9MB. I wonder how they load it in-memory?

    Below is some test results of lookup. It loads data and lookup ARG times from 283600 records/hash.
    >perl 07.pl 100
                (warning: too few iterations for a reliable count)
                   s/iter 03_sqlite_disk      01_substr  02_sqlite_mem
    03_sqlite_disk   24.0             --           -89%           -97%
    01_substr        2.72           780%             --           -77%
    02_sqlite_mem   0.627          3720%           334%             --
    

    I cut SQLite on disk here after.
    >perl 07.pl 1000
                  s/iter     01_substr 02_sqlite_mem
    01_substr       2.71            --          -75%
    02_sqlite_mem  0.687          295%            --
    
    >perl 07.pl 10000
                  s/iter     01_substr 02_sqlite_mem
    01_substr       2.74            --          -61%
    02_sqlite_mem   1.07          157%            --
    
    >perl 07.pl 50000
                  s/iter     01_substr 02_sqlite_mem
    01_substr       2.80            --           -3%
    02_sqlite_mem   2.72            3%            --
    
    >perl 07.pl 100000
                  s/iter 02_sqlite_mem     01_substr
    02_sqlite_mem   4.59            --          -36%
    01_substr       2.92           57%            --
    
    
    And test code. I would like to lookup around 2000 to 15000 records, so SQLite in-memory suit me fine. Thanks for information.

      Looks like you have your solution. ;)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://995196]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2024-04-25 11:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found