Re: Merge 2 files using key field

in reply to Merge 2 files using key field

Jim,
LanX indicated that the Monastery was a learning site. It is also a social site and a teaching site among other things. While I agree with LanX and Corion that you would have done yourself a favor by first searching to see if this same problem had already been asked, I can imagine that it is possible you did that and didn't understand or found that they didn't work for one reason or another. Unfortunately, you didn't indicate that so we have to assume you didn't bother searching (hopefully lesson learned).

As for your actual problem at hand, you need to state your constraints and your objectives. Below is a hypothetical list:

File 1 is the master file
Any records in file 2 not in file 1 are discarded
Any records in file 1 not in file 2 should have empty string as output for 2nd file
Output must appear in the same order as file 1
Keys are not unique. The first appearance of a key in file 1 should be paired with the first appearance in file 2, 2nd with 2nd, etc
File 1 is too big to fit into memory
Neither file is sorted in relationship to the key
Output should omit the key (just the values from file 1 and file 2 separated by a comma)

Had you done what I did above (remember, these are made up constraints/goals), it would have been easy to say "I looked at <some thread> but it won't work for me because of X".

If I take the simplest assumptions (file 1 can fit into memory, order doesn't matter, keys are unique, no lines in file 1 not in file 2) then the solution should be pretty obvious.

Read file 1 into a hash (key = first field, value = second field)
Read file 2 line by line
If first field is not in hash, move on to next line
Else, print out the value in the hash, a comma and field 2

I could even easily work around the limitation if file 1 contains records not in file 2:

Read file 1 into a hash (key = first field, value = second field)
Read file 2 line by line
If first field is not in hash, move on to next line
Else, print out the value in the hash, a comma and field 2
Delete the hash entry just printed
When 2nd file has been read completely, iterate over hash and print out line for each entry (not found in file 2)

I hope this helps you understand how you can help yourself.

Cheers - L~R

Comment on Re: Merge 2 files using key field

In Section Seekers of Perl Wisdom