Perl Monk, Perl Meditation | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Unless your data is sorted and the file's lines (or records) are fixed in length, your solution will never be faster than O(n). However, there's a lot of room for improvement in runtimes even if magnitudes of work don't change. One aspect to consider is how often you're expecting to see a match in the 16GB input file. If matching records are sparse, you can gain a lot by rejecting non-matches and short-circuiting the loop's iteration as early as possible. Instead of splitting the line, massaging $tab_delimited_array[3], and then running it through Unix_Date and Date_ConvTZ before finally testing to see if $date_converted is the same as $extracted_YMD, couldn't you massage your $date_converted into something that more approximates the raw format of the date presented in the 16GB file? That would allow for faster rejections of unneeded lines. Second, if it turns out that there are frequent matches in the file, you might be wasting unnecessary time printing often. You could push $_ onto a cache array, and then print the array every 1000 iterations, for example. Then do a final flush after terminating the loop. That would be a small enough chunk as to not introduce memory problems, while at the same time reducing time spent in IO calls. Dave In reply to Re: Optimise the script
by davido
|
|