|Perl: the Markov chain saw|
Re^3: Efficient way to handle huge number of records?by Marshall (Monsignor)
|on Dec 19, 2011 at 11:15 UTC||Need Help??|
In the past (last time I tried was, I think, a couple of years ago) SQLite always proved prohibitively slow: loading multimillion-row data was so ridiculously slow (even on fast hardware), that I never bothered with further use.
I showed the framework of some code at Re^3: Efficient way to handle huge number of records? which is an abbreviated version of some code that I'm currently working on. One table has a million records of 50 fields. So I ran a couple of tests.
First test was with all the speed-up stuff turned off:
Now I figure that qualifies as "ridiculously slow!". I actually had to run it twice because I got one of those "Windows Automatic Reboot was required" things! Bummer when that happens after one day of processing!
Using the optimizations and by FAR and away the biggest effect is to do a single transaction! Results in:
Or about 3 minutes instead of 2 days! A lot better! This is fast enough for my needs. Using the bulk import utility would probably be faster, but I haven't tested that because 3 minutes doesn't bother my application.
I have another app that builds a 500K record table and it builds it from 1,000 input files. Takes about 1/2 the time or about 90 seconds. Its not worth my programming effort to emit an intermediate file in the whatever the bulk import utility needs - I just put the data into the DB right away. A reasonable programming tradeoff. Mileage varies.
It should be noted that my machine is an older one, a hyper threaded one (before the multi-core days), the Prescott stepping - last one with PGA (pin grid array) and my disks are only 5K rpm (not 7K+). A more modern machine can run a single process at least 4x this fast or about 45 seconds instead of 3 minutes (I've bench marked my machine vs a friend's on similar tasks before).
The time scales linearly, so 11M records would take 10x as long. Is that "ridiculously slow?", I don't know. I guess that depends upon the application.
I do have a MySQL server running on my machine and in the past I've done some benchmarking vs SQLite. For a complicated query, MySQL is faster, but for my current projects, SQLite "wins" due to admin simplicity (none required!).