Perl-Sensitive Sunglasses | |
PerlMonks |
Re: Solve the large file size issueby sundialsvc4 (Abbot) |
on Apr 14, 2015 at 22:05 UTC ( [id://1123455]=note: print w/replies, xml ) | Need Help?? |
It superficially appears to me that what you are setting out to do here is simply, “a merge.” If you know that you have two files which are sorted by an identical key, you can write very efficient logic to process the two files. Or, quite likely, you can find an already-existing CPAN module that does this. (Sort::Merge and File::MergeSort both look interesting.) If you want to “code your own” solution, here’s how I presented a solution to my COBOL classes, all those years ago. (Utterly ignoring the textbook’s complicated examples.) Use a state-machine approach: first, figure out what state you’re in, then do the right thing. There are the following states: (in a “two files” scenario)
(koff, koff ...) Interesting stuff for a late-night community college class, yes, and nice because it can easily be extended to deal with any number of input files. But otherwise, this is “a thing already done.” This sort of data-processing has (literally ...) been done, and done very well, since the days of Herman Hollerith. Grab an existing, off-the-shelf CPAN module and use it. The computer should positively scream through a “mere” 8 million lines, since it only has to make one sequential pass through the file(s) to produce the right answer. You should have your solution in, say, “worst case, a second or so...”
In Section
Seekers of Perl Wisdom
|
|