|Perl: the Markov chain saw|
Re: How to deal with Huge databy glasswalk3r (Pilgrim)
|on Jan 23, 2007 at 12:39 UTC||Need Help??|
First of all, what is the nature of the text file? Does it have repeated keys inside the same file?
Using hashes does consumes a lot of memory... but you can allways divide to conquer. ;-)
Suposing that you have two different files that you want to merge, you could try to reduce the file of each one looking for repeated keys and summing the numeric data.
Of course, this depends on the possibility to reduce the size of each file to an acceptable size. If this is not possible, you could consider working with slices of the file or using a database, since it will hold the data on the disc this should work. You can use any database, but DBD::DBM looks like ideal to your needs
What do you mean by saying "The cols are different in each file (but not always ...)"? This means the columns are in different places? Is easier to normalize that (put all columns and values in a previous defined position) and start working. For instance, once the files that the columns in the correct position, you can even forget about jumping over the first line: the program can easialy print the columns names in the output file later.
You're using hash references inside hashes... the more complicate structures you start using, the more memory the program will required.
Some other tips:
Alceu Rodrigues de Freitas Junior
"You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill