|Perl: the Markov chain saw|
Data Matching Challengeby expo (Initiate)
|on Feb 01, 2007 at 20:15 UTC||Need Help??|
expo has asked for the
wisdom of the Perl Monks concerning the following question:
I am trying to compare two data sets and pull out the matches based on an id. I could use a little wisdom for this :)
The goal would be to merge these together such that redundant ids (first fields) are included and those not present in Dataset A are excluded. So the merging and filtering of the data would look something like this:
Now, I could easily iterate through two arrays side by side and do a pattern match BUT the problem is speed. I have enormous amounts of data that I need to mine through so it needs to be pretty fast.
I started making a hash table but you need a unique id which is problematic the keys need to be unique and I am interested redundant matches. I started building a matrix using anonymous arrays but it started getting clumsy and I know there is a more elegant way to do this.
Any ideas or suggestions would be greatly appreciated!! Expo