Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Merge 2 files using key field

by Jalcock501 (Sexton)
on Nov 05, 2013 at 11:50 UTC ( [id://1061291]=perlquestion: print w/replies, xml ) Need Help??

Jalcock501 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Guys

I know that I am meant to show my working however I have none because I cannot find anywhere that can help me with this issue, so I have come to you (the monks) cap in hand to plead for your help.

I have 2 files which have a common field that need to be joined together.
File1: 70000004 12008401 70000005 52003402 70000007 17527802 70000008 95050962 70000011 50010101 70000012 17576901 70000013 07072701 70000014 52010201 70000015 35555501 70000017 53561503 70000018 40500304 70000019 35513607 70000021 41548701 70000022 54644601 70000022 95050682 File2: 70000004 00 70000005 25 70000007 25 70000008 31 70000008 31 70000009 34 70000010 21 70000011 21 70000012 17 70000013 47 70000014 19 70000015 43 70000017 20 70000018 15 70000019 11
As you can see with the data, there are some fields in file 2 that are not in file1, I need to be able to join both of the second fields together to form a new .csv in this format.
00000000,00
The key fields (field 1 in both files) are to be removed after merging as they are no longer required. Any entries that do not match file1 are to be removed. I looked at just pasting the files together but it is too clumsy for what I need to achieve.

If someone could please help as I am at my wits end

Thanks

Jim

Replies are listed 'Best First'.
Re: Merge 2 files using key field
by Limbic~Region (Chancellor) on Nov 05, 2013 at 13:15 UTC
    Jim,
    LanX indicated that the Monastery was a learning site. It is also a social site and a teaching site among other things. While I agree with LanX and Corion that you would have done yourself a favor by first searching to see if this same problem had already been asked, I can imagine that it is possible you did that and didn't understand or found that they didn't work for one reason or another. Unfortunately, you didn't indicate that so we have to assume you didn't bother searching (hopefully lesson learned).

    As for your actual problem at hand, you need to state your constraints and your objectives. Below is a hypothetical list:

    • File 1 is the master file
    • Any records in file 2 not in file 1 are discarded
    • Any records in file 1 not in file 2 should have empty string as output for 2nd file
    • Output must appear in the same order as file 1
    • Keys are not unique. The first appearance of a key in file 1 should be paired with the first appearance in file 2, 2nd with 2nd, etc
    • File 1 is too big to fit into memory
    • Neither file is sorted in relationship to the key
    • Output should omit the key (just the values from file 1 and file 2 separated by a comma)

    Had you done what I did above (remember, these are made up constraints/goals), it would have been easy to say "I looked at <some thread> but it won't work for me because of X".

    If I take the simplest assumptions (file 1 can fit into memory, order doesn't matter, keys are unique, no lines in file 1 not in file 2) then the solution should be pretty obvious.

    1. Read file 1 into a hash (key = first field, value = second field)
    2. Read file 2 line by line
    3. If first field is not in hash, move on to next line
    4. Else, print out the value in the hash, a comma and field 2
    I could even easily work around the limitation if file 1 contains records not in file 2:
    1. Read file 1 into a hash (key = first field, value = second field)
    2. Read file 2 line by line
    3. If first field is not in hash, move on to next line
    4. Else, print out the value in the hash, a comma and field 2
    5. Delete the hash entry just printed
    6. When 2nd file has been read completely, iterate over hash and print out line for each entry (not found in file 2)

    I hope this helps you understand how you can help yourself.

    Cheers - L~R

Re: Merge 2 files using key field
by Corion (Patriarch) on Nov 05, 2013 at 12:17 UTC
Re: Merge 2 files using key field
by LanX (Saint) on Nov 05, 2013 at 12:02 UTC
    Well this is a learning, not a coding website.

    First lesson:

    Use search engines!

    (doesn't require taking your "cap in hand"! =)

    --> Perl Merge two files using key field

    looks promising for me...

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    PS:

    > I know that I am meant to show my working however I have none because ...

    nice try! ;-)

Re: Merge 2 files using key field
by marinersk (Priest) on Nov 05, 2013 at 17:29 UTC
    So, let's nudge this along in a more healthy direction.

    Surely you have code that reads the files. Can you show that?

      Hi Marinersk

      I've managed to solve this now using awk instead. I know its not perl but for those interested here is how I did it.

      #!/usr/gnu/bin/gawk -f BEGIN { OFS=FS="," } { com="xchop -d, group.csv " $1 while ((com |getline res) > 0) { if (split(res,arr) > 1) print $2,arr[2] } close(com) }
      xchop is a C written function that acts as sort of a binary chop for strings.

      Thanks Jim

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1061291]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2024-04-25 08:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found