P is for Practical | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
You seek a performance improvement over your awk solution. For there to be an improvement, there must be room to improve. I think you mentioned in a follow-up comment that your list of names is in the order in which they will appear in the files you are parsing. This is useful, as you can save the small amount of time it might have taken to import the list of names into a hash. So lets do a little profiling:
Now run that on your input file; the largest one you've got, and see how long it takes. If it takes too long, you can stop right there because there is no Perl (or any other language) solution that will meet your time requirements unless you change the requirements by processing streams more frequently, or overnight when it doesn't matter, etc. If it is fast enough, then you could take the next step by implementing a solution in Perl that is similiarly linear in its computational complexity:
This operates under the assumption that there will be exactly one match for each name in your list, and that your names list is in the correct order. If those assumptions are incorrect, then read your names list into a hash to start with; this will incur only a slight penalty -- so slight it's probably not worth maintaining your names list in any particular order to begin with. If it's not in order, just do this:
This last solution is still a linear time solution, as was the previous one, but is more flexible on the order in which things happen. It still makes one assumption; you're only looking for each name one time. You can remove the delete and the last if lines if that assumption isn't correct. At any rate, if the initial profiling check determined that the sheer act of reading the files takes longer than you have, you'll have to come up with a different strategy that doesn't involve sitting around waiting for large files to load. Dave In reply to Re: Extacting lines where one column matches a name from a list of names
by davido
|
|