in reply to Merge 2 files using key field
Jim,
LanX indicated that the Monastery was a learning site. It is also a social site and a teaching site among other things. While I agree with LanX and Corion that you would have done yourself a favor by first searching to see if this same problem had already been asked, I can imagine that it is possible you did that and didn't understand or found that they didn't work for one reason or another. Unfortunately, you didn't indicate that so we have to assume you didn't bother searching (hopefully lesson learned).
LanX indicated that the Monastery was a learning site. It is also a social site and a teaching site among other things. While I agree with LanX and Corion that you would have done yourself a favor by first searching to see if this same problem had already been asked, I can imagine that it is possible you did that and didn't understand or found that they didn't work for one reason or another. Unfortunately, you didn't indicate that so we have to assume you didn't bother searching (hopefully lesson learned).
As for your actual problem at hand, you need to state your constraints and your objectives. Below is a hypothetical list:
- File 1 is the master file
- Any records in file 2 not in file 1 are discarded
- Any records in file 1 not in file 2 should have empty string as output for 2nd file
- Output must appear in the same order as file 1
- Keys are not unique. The first appearance of a key in file 1 should be paired with the first appearance in file 2, 2nd with 2nd, etc
- File 1 is too big to fit into memory
- Neither file is sorted in relationship to the key
- Output should omit the key (just the values from file 1 and file 2 separated by a comma)
Had you done what I did above (remember, these are made up constraints/goals), it would have been easy to say "I looked at <some thread> but it won't work for me because of X".
If I take the simplest assumptions (file 1 can fit into memory, order doesn't matter, keys are unique, no lines in file 1 not in file 2) then the solution should be pretty obvious.
- Read file 1 into a hash (key = first field, value = second field)
- Read file 2 line by line
- If first field is not in hash, move on to next line
- Else, print out the value in the hash, a comma and field 2
- Read file 1 into a hash (key = first field, value = second field)
- Read file 2 line by line
- If first field is not in hash, move on to next line
- Else, print out the value in the hash, a comma and field 2
- Delete the hash entry just printed
- When 2nd file has been read completely, iterate over hash and print out line for each entry (not found in file 2)
I hope this helps you understand how you can help yourself.
Cheers - L~R
|
---|
In Section
Seekers of Perl Wisdom