Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^2: speeding up row by row lookup in a large db

by punkish (Priest)
on Mar 21, 2009 at 18:22 UTC ( [id://752274]=note: print w/replies, xml ) Need Help??


in reply to Re: speeding up row by row lookup in a large db
in thread speeding up row by row lookup in a large db

How do I load the entire data set into memory? The denormalized data set is a few hundred GB (at least, as far as the BLOBs approach suggests), and, isn't Perl (or any process on a 32 bit computer) restricted to addressing 2 GB RAM? I don't really understand that very well, so if you have a strategy I can use, I am all ears.

Many thanks,

Update: Adding this note to explain more in response to perrin's note. In my original post above I have shown only the SELECTs and mentioned that each SELECT takes about 33 ms. To try and reduce that, I tried converting the result of each select to a BLOB. Since each result is a ref to an array of arrays, I serialized it using Storable with the help of Data::Serializer and INSERTed it into the blobs table. This was stupendously slow. I tried with both transactions (experimented with committing every 10, 100 and 1000 INSERTs) and without transactions. Besides the fact that each BLOB becomes 430 KB, which would result in a db much larger than my laptop's drive if run to completion, fortunately for my laptop, the darn process ran overnight and had done only about 30,000 or so INSERTs.

--

when small people start casting long shadows, it is time to go to bed
  • Comment on Re^2: speeding up row by row lookup in a large db

Replies are listed 'Best First'.
Re^3: speeding up row by row lookup in a large db
by perrin (Chancellor) on Mar 21, 2009 at 19:19 UTC

    Well, in your question you said the whole database was 430MB, so you can see why I would suggest loading it into RAM. Perl should be able to access more than 2GB RAM on a 64-bit machine, and to some extent on 32-bit one if you have the right Linux kernel.

    INSERTs will definitely run faster if you only commit every 1000. There may be some other SQLite tuning tricks, which you'd probably find on a mailing list or wiki devoted to SQLite. But if none of those work for you, I think MySQL is your best bet.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://752274]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2024-04-18 10:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found