Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Jim,
LanX indicated that the Monastery was a learning site. It is also a social site and a teaching site among other things. While I agree with LanX and Corion that you would have done yourself a favor by first searching to see if this same problem had already been asked, I can imagine that it is possible you did that and didn't understand or found that they didn't work for one reason or another. Unfortunately, you didn't indicate that so we have to assume you didn't bother searching (hopefully lesson learned).

As for your actual problem at hand, you need to state your constraints and your objectives. Below is a hypothetical list:

  • File 1 is the master file
  • Any records in file 2 not in file 1 are discarded
  • Any records in file 1 not in file 2 should have empty string as output for 2nd file
  • Output must appear in the same order as file 1
  • Keys are not unique. The first appearance of a key in file 1 should be paired with the first appearance in file 2, 2nd with 2nd, etc
  • File 1 is too big to fit into memory
  • Neither file is sorted in relationship to the key
  • Output should omit the key (just the values from file 1 and file 2 separated by a comma)

Had you done what I did above (remember, these are made up constraints/goals), it would have been easy to say "I looked at <some thread> but it won't work for me because of X".

If I take the simplest assumptions (file 1 can fit into memory, order doesn't matter, keys are unique, no lines in file 1 not in file 2) then the solution should be pretty obvious.

  1. Read file 1 into a hash (key = first field, value = second field)
  2. Read file 2 line by line
  3. If first field is not in hash, move on to next line
  4. Else, print out the value in the hash, a comma and field 2
I could even easily work around the limitation if file 1 contains records not in file 2:
  1. Read file 1 into a hash (key = first field, value = second field)
  2. Read file 2 line by line
  3. If first field is not in hash, move on to next line
  4. Else, print out the value in the hash, a comma and field 2
  5. Delete the hash entry just printed
  6. When 2nd file has been read completely, iterate over hash and print out line for each entry (not found in file 2)

I hope this helps you understand how you can help yourself.

Cheers - L~R


In reply to Re: Merge 2 files using key field by Limbic~Region
in thread Merge 2 files using key field by Jalcock501

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others lurking in the Monastery: (4)
    As of 2014-09-20 17:03 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (160 votes), past polls