Good point. However, if disk IO and/or time to complete the task is the issue, you may want to consider a hybrid system that processes larger files differently. You would need to find out what the sweet spot is to determine a large file from a small file. Then its simply a matter of stating the file size to determine which method to process it.
I too have a program that collects statistics and stores them into a hash. However, I routinely write the data to the hard drive on a set interval to avoid running out of memory. Based on what I've learned from my experiment, your hashes must be quite large to suck the memory from a machine.
This of course depends on your ultimate goal for your application.
|