|Perl: the Markov chain saw|
I tried running your code for about half an hour it did not produce any result thus efficiency is still an issue here.
Given the larger sample of data that you gave for the first file, it's clear that the code in my initial reply wasn't working as intended. Sorry about that. I'll give it one more try and let you know how that comes out, but in the meantime...
What'd you guys say about trying DBDsqlite or mysql for this task would the overhead be more or less??
Actually, when you've got a clear layout of the logical structure of the inputs, the decision process and the desired outputs, it's reasonably likely that a relational-table/SQL-based solution will help a lot. It has the same prerequisites as writing a perl script in order to get the right answers: having the right way to describe the task. Once you have that, it'll probably be a lot easier to write a suitable SQL statement to express the conditions and accomplish the operations that need to be done. If nothing else, just being able to deal with relevant subsets of the data at a time, rather than having to slog through one huge, monolithic stream, would be a win.
I don't know enough about SQLite to say how good it would be with optimizing queries, but Mysql is easy to set up, easy to use, and quite effective for very large data sets (and it's very well documented). So long as you make sure to apply indexing on the table fields that get used a lot in queries, you should see pretty zippy results. You get a lot of built-in efficiency for free.
In reply to Re^3: Comparing and getting information from two large files and appending it in a new file