go ahead... be a heretic | |
PerlMonks |
Re: Needed Performance improvement in reading and fetching from a fileby aufflick (Deacon) |
on Oct 11, 2008 at 10:09 UTC ( [id://716579]=note: print w/replies, xml ) | Need Help?? |
So what you are basically doing is:
1. Checking if column 2 has been seen already - if so, next line This is a pretty common thing to do and can be super fast. As already pointed out, the easiest win is to use a hash for maintaining the record of what col 2 values have been seen. That will get your check nearer O(1) than O(N). 20k isn't a lot - this is all you should have to do. If you find yourself dealing with a LOT of records (say half a million) you can get really cheap use of multiple cpu/cores (assuming you have them) by writing two scripts - 1 to strip out all the lines with duplicated col 2 values, the second can thus skip that step. Then pipe the output of one script to the input of the other and Unix will run the two processes in parallel for you. Assuming you are using a Unix OS that is. Something like:
In Section
Seekers of Perl Wisdom
|
|