Merge 2 files using key field

Jalcock501 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Guys

I know that I am meant to show my working however I have none because I cannot find anywhere that can help me with this issue, so I have come to you (the monks) cap in hand to plead for your help.

I have 2 files which have a common field that need to be joined together.

File1:
70000004    12008401
70000005    52003402
70000007    17527802
70000008    95050962
70000011    50010101
70000012    17576901
70000013    07072701
70000014    52010201
70000015    35555501
70000017    53561503
70000018    40500304
70000019    35513607
70000021    41548701
70000022    54644601
70000022    95050682

File2:
70000004    00
70000005    25
70000007    25
70000008    31
70000008    31
70000009    34
70000010    21
70000011    21
70000012    17
70000013    47
70000014    19
70000015    43
70000017    20
70000018    15
70000019    11
[download]

As you can see with the data, there are some fields in file 2 that are not in file1, I need to be able to join both of the second fields together to form a new .csv in this format.

00000000,00
[download]

The key fields (field 1 in both files) are to be removed after merging as they are no longer required. Any entries that do not match file1 are to be removed. I looked at just pasting the files together but it is too clumsy for what I need to achieve.

If someone could please help as I am at my wits end

Thanks

Jim

Comment on Merge 2 files using key field Select or Download Code

Replies are listed 'Best First'.
Re: Merge 2 files using key field by Limbic~Region (Chancellor) on Nov 05, 2013 at 13:15 UTC
Jim, LanX indicated that the Monastery was a learning site. It is also a social site and a teaching site among other things. While I agree with LanX and Corion that you would have done yourself a favor by first searching to see if this same problem had already been asked, I can imagine that it is possible you did that and didn't understand or found that they didn't work for one reason or another. Unfortunately, you didn't indicate that so we have to assume you didn't bother searching (hopefully lesson learned). As for your actual problem at hand, you need to state your constraints and your objectives. Below is a hypothetical list: File 1 is the master file Any records in file 2 not in file 1 are discarded Any records in file 1 not in file 2 should have empty string as output for 2nd file Output must appear in the same order as file 1 Keys are not unique. The first appearance of a key in file 1 should be paired with the first appearance in file 2, 2nd with 2nd, etc File 1 is too big to fit into memory Neither file is sorted in relationship to the key Output should omit the key (just the values from file 1 and file 2 separated by a comma) Had you done what I did above (remember, these are made up constraints/goals), it would have been easy to say "I looked at <some thread> but it won't work for me because of X". If I take the simplest assumptions (file 1 can fit into memory, order doesn't matter, keys are unique, no lines in file 1 not in file 2) then the solution should be pretty obvious. Read file 1 into a hash (key = first field, value = second field) Read file 2 line by line If first field is not in hash, move on to next line Else, print out the value in the hash, a comma and field 2 I could even easily work around the limitation if file 1 contains records not in file 2: Read file 1 into a hash (key = first field, value = second field) Read file 2 line by line If first field is not in hash, move on to next line Else, print out the value in the hash, a comma and field 2 Delete the hash entry just printed When 2nd file has been read completely, iterate over hash and print out line for each entry (not found in file 2) I hope this helps you understand how you can help yourself. Cheers - L~R	[reply]
Re: Merge 2 files using key field by Corion (Patriarch) on Nov 05, 2013 at 12:17 UTC
Also see join - join two files according to a common key.	[reply]
Re: Merge 2 files using key field by LanX (Saint) on Nov 05, 2013 at 12:02 UTC
Well this is a learning, not a coding website. First lesson: Use search engines! (doesn't require taking your "cap in hand"! =) --> Perl Merge two files using key field looks promising for me... Cheers Rolf ( addicted to the Perl Programming Language) PS: > I know that I am meant to show my working however I have none because ... nice try! ;-)	[reply]
Re: Merge 2 files using key field by marinersk (Priest) on Nov 05, 2013 at 17:29 UTC
So, let's nudge this along in a more healthy direction. Surely you have code that reads the files. Can you show that?	[reply]
Re^2: Merge 2 files using key field by Jalcock501 (Sexton) on Nov 11, 2013 at 09:15 UTC
Hi Marinersk I've managed to solve this now using awk instead. I know its not perl but for those interested here is how I did it. `#!/usr/gnu/bin/gawk -f BEGIN { OFS=FS="," } { com="xchop -d, group.csv " $1 while ((com \|getline res) > 0) { if (split(res,arr) > 1) print $2,arr[2] } close(com) }` [download] xchop is a C written function that acts as sort of a binary chop for strings. Thanks Jim	[reply] [d/l]


Perl Monk, Perl Meditation
	PerlMonks

Merge 2 files using key field

First lesson: