Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Merge 2 files using key field

by Limbic~Region (Chancellor)
on Nov 05, 2013 at 13:15 UTC ( #1061301=note: print w/ replies, xml ) Need Help??


in reply to Merge 2 files using key field

Jim,
LanX indicated that the Monastery was a learning site. It is also a social site and a teaching site among other things. While I agree with LanX and Corion that you would have done yourself a favor by first searching to see if this same problem had already been asked, I can imagine that it is possible you did that and didn't understand or found that they didn't work for one reason or another. Unfortunately, you didn't indicate that so we have to assume you didn't bother searching (hopefully lesson learned).

As for your actual problem at hand, you need to state your constraints and your objectives. Below is a hypothetical list:

  • File 1 is the master file
  • Any records in file 2 not in file 1 are discarded
  • Any records in file 1 not in file 2 should have empty string as output for 2nd file
  • Output must appear in the same order as file 1
  • Keys are not unique. The first appearance of a key in file 1 should be paired with the first appearance in file 2, 2nd with 2nd, etc
  • File 1 is too big to fit into memory
  • Neither file is sorted in relationship to the key
  • Output should omit the key (just the values from file 1 and file 2 separated by a comma)

Had you done what I did above (remember, these are made up constraints/goals), it would have been easy to say "I looked at <some thread> but it won't work for me because of X".

If I take the simplest assumptions (file 1 can fit into memory, order doesn't matter, keys are unique, no lines in file 1 not in file 2) then the solution should be pretty obvious.

  1. Read file 1 into a hash (key = first field, value = second field)
  2. Read file 2 line by line
  3. If first field is not in hash, move on to next line
  4. Else, print out the value in the hash, a comma and field 2
I could even easily work around the limitation if file 1 contains records not in file 2:
  1. Read file 1 into a hash (key = first field, value = second field)
  2. Read file 2 line by line
  3. If first field is not in hash, move on to next line
  4. Else, print out the value in the hash, a comma and field 2
  5. Delete the hash entry just printed
  6. When 2nd file has been read completely, iterate over hash and print out line for each entry (not found in file 2)

I hope this helps you understand how you can help yourself.

Cheers - L~R


Comment on Re: Merge 2 files using key field

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1061301]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2014-08-30 00:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (289 votes), past polls