|
|
|
good chemistry is complicated, and a little bit messy -LW |
|
| PerlMonks |
Re: Best way to search fileby Laurent_R (Canon) |
| on Apr 15, 2015 at 17:31 UTC ( [id://1123535]=note: print w/replies, xml ) | Need Help?? |
|
If your process is so slow, it is quite likely because your are scanning the full content of file2 for each line of file1.
If this is the case, then you will find that storing file2 in a hash before starting to process file1 will make the process incredibly faster. And the larger file2 is, the higher the speed gain. As mentioned by sundialsvc4, the only limit to that is that if file2 is so big that the hash will take all the memory, then the hash is no longer a solution. (It depends on your system, but with today's typical RAM, my experience is that the limit could be somewhere between 5 and 15 million lines for file2.) In that case, I would really recommend sorting the files and reading sequentially both files in parallel. This is in my experience with huge files way faster than using a database. The only downside with this approach is that the algorithm for reading 2 files in parallel can be a bit tricky, with quite a few edge cases to be taken care of.
Je suis Charlie.
In Section
Seekers of Perl Wisdom
|
|
||||||||||||||||||||||||||||||||||||||||||