Clear questions and runnable code get the best and fastest answer |
|
PerlMonks |
Re^2: Counter - number of tags per intervalby sundialsvc4 (Abbot) |
on May 04, 2013 at 20:31 UTC ( [id://1032081]=note: print w/replies, xml ) | Need Help?? |
This process would also benefit from being run, say, in a cluster. If your program accepted as parameters the starting and ending line-numbers that it is to process, then multiple independent instances of the same process, running on different machines, could each summarize a chunk of the file, doing it in the manner just suggested:
The last step will be a “classic merge,” because you know that each process read the ranges-file sequentially, and that it output either zero or one record for each range, in that order. Thus, the merge process (which generates an output that can also if need be be used as its input ...) might go something like this:
Run as many instances as you can, provided that each instance never exhausts the amount of real, not virtual, memory that is available in the machine for it. Each instance reads each file sequentially. All of the tallying and range-comparison should occur entirely in memory and should make maximum use of memory as an “instantaneous file.” Each process should exhibit a large working-set size but no page-outs.
In Section
Seekers of Perl Wisdom
|
|