in reply to Re^2: Efficient way to handle huge number of records?
in thread Efficient way to handle huge number of records?
I showed the framework of some code at Re^3: Efficient way to handle huge number of records? which is an abbreviated version of some code that I'm currently working on. One table has a million records of 50 fields. So I ran a couple of tests.
First test was with all the speed-up stuff turned off:
Now I figure that qualifies as "ridiculously slow!". I actually had to run it twice because I got one of those "Windows Automatic Reboot was required" things! Bummer when that happens after one day of processing!Starting: Fri Dec 16 13:56:56 2011 Creating new Database!! - HD table records inserted: 1,117,526 Ending: Sun Dec 18 02:15:30 2011
Using the optimizations and by FAR and away the biggest effect is to do a single transaction! Results in:
Or about 3 minutes instead of 2 days! A lot better! This is fast enough for my needs. Using the bulk import utility would probably be faster, but I haven't tested that because 3 minutes doesn't bother my application.Starting: Sun Dec 18 15:26:53 2011 Creating new Database!! - HD table records inserted: 1,117,526 Ending: Sun Dec 18 15:29:44 2011
I have another app that builds a 500K record table and it builds it from 1,000 input files. Takes about 1/2 the time or about 90 seconds. Its not worth my programming effort to emit an intermediate file in the whatever the bulk import utility needs - I just put the data into the DB right away. A reasonable programming tradeoff. Mileage varies.
It should be noted that my machine is an older one, a hyper threaded one (before the multi-core days), the Prescott stepping - last one with PGA (pin grid array) and my disks are only 5K rpm (not 7K+). A more modern machine can run a single process at least 4x this fast or about 45 seconds instead of 3 minutes (I've bench marked my machine vs a friend's on similar tasks before).
The time scales linearly, so 11M records would take 10x as long. Is that "ridiculously slow?", I don't know. I guess that depends upon the application.
I do have a MySQL server running on my machine and in the past I've done some benchmarking vs SQLite. For a complicated query, MySQL is faster, but for my current projects, SQLite "wins" due to admin simplicity (none required!).