compare two files by column and return second (matching) column

ejbiers has asked for the wisdom of the Perl Monks concerning the following question:

I have two files. File A contains 2 columns (tab separated) and File B contains 1 column.

ex. File A

name1 xxxxx

name2 yyyyy

name3 zzzzz

name4 aaaaa

name5 bbbbb

File B

name3

name5

name1

What I'd like to get in an output file is:

zzzzz

bbbbb

xxxxx

I am a novice at perl and have been trying to modify the following script (designed to count the number of occurrences) to meet my needs with no success. Could someone suggest how to modify this script? Thanks for any help you can give!

 unless (@ARGV == 3) {
    print "Use as follows: perl program.pl in1.file in2.file output.fi
+le\n";
    die;
}

my $in1 = $ARGV[0];
my $in2 = $ARGV[1];
my $fout = $ARGV[2];

open ONE, $in1;
open TWO, $in2;
open foutname, ">$fout";

my %hash1;
my @hit;    

while (<ONE>){
        chomp;
        my @hit = split(/\t/, $_);
    #start them as "0" for "no duplicates"
        $hash1{$hit[0]}=0;
    }
close ONE;

my @col;

while (<TWO>){
        chomp;
        my @col = split(/\t/, $_);
    #increment the counter if %hash1 has what we're     looking for.
    ++$hash1{$col[0]} if(exists($hash1{$col[0]}));
    }

my @dups = grep { $hash1{$_} > 0 } keys %hash1;
for my $k (@dups) {
    print foutname "$k\t$hash1{$k}\n";
    }
 
close TWO;
close foutname;
[download]

Comment on compare two files by column and return second (matching) column Download Code

Replies are listed 'Best First'.
Re: compare two files by column and return second (matching) column by Kenosis (Priest) on Aug 06, 2012 at 20:36 UTC
Excellent suggestions have been given. Here's another option: `use Modern::Perl; open my $fhA, '<', 'FileA.txt' or die $!; my %hash = map { /(.+)\t(.+)/; $1 => $2 } grep /\S/, <$fhA>; close $fhA; open my $fhB, '<', 'FileB.txt' or die $!; say for map { chomp; $hash{$_} } grep /\S/, <$fhB>; close $fhB;` [download] Output: `zzzzz bbbbb xxxxx` [download] The first `map` uses a regex to grab the key/value pairs from FileA for placement into `%hash`. The second `map` uses the single entry from FileB to show that entry's associated value from FileA. Since there were blank lines in the files (from your data set), `grep /\S/` was used to filter those. Hope this helps!	[reply] [d/l] [select]
Re^2: compare two files by column and return second (matching) column by ejbiers (Initiate) on Aug 07, 2012 at 13:45 UTC
This is perfect! Thanks! However, is there a way to print the results into an output file?	[reply]
Re^3: compare two files by column and return second (matching) column by Kenosis (Priest) on Aug 07, 2012 at 15:28 UTC
You're most welcome, ejbiers! Yes, the script below includes printing the results to a file: `use Modern::Perl; open my $fhA, '<', 'FileA.txt' or die $!; my %hash = map { /(.+)\t(.+)/; $1 => $2 } grep /\S/, <$fhA>; close $fhA; open my $fhB, '<', 'FileB.txt' or die $!; my @output = map { chomp; $hash{$_} } grep /\S/, <$fhB>; close $fhB; open my $fhO, '>', 'Output.txt' or die $!; say $fhO $_ for @output; close $fhO;` [download]	[reply] [d/l]
Re: compare two files by column and return second (matching) column by nemesdani (Friar) on Aug 06, 2012 at 18:26 UTC
I have some general advice for you, but I'll wait till I'm sober. Till then: `use strict; use warnings; my %data; open (DATA1, "<", "data1.txt") or die "cannot open file"; while (<DATA1>) { my @line = split (/\t/, $_); $data{$line[0]} = $line[1]; } my @k = keys %data; foreach my $key (@k) { print "$key, $data{$key}\n"; } close DATA1; open (DATA2, "<", "data2.txt") or die "cannot open file"; print "found:\n"; while (<DATA2>) { chomp; if (exists $data{"$_"}) {print "$data{$_}\n";} close DATA2;` [download] This doesn't check duplicates, is too verbose, but I hope it'll help. I'm too lazy to be proud of being impatient.	[reply] [d/l]
Re: compare two files by column and return second (matching) column by BillKSmith (Monsignor) on Aug 06, 2012 at 19:13 UTC
Read the FAQ's to understand how this works. Refer `perldoc -q duplicate`	[reply] [d/l]
Re: compare two files by column and return second (matching) column by aitap (Curate) on Aug 06, 2012 at 19:32 UTC
Firstly, a couple of advices: `my $in1 = $ARGV[0]; my $in2 = $ARGV[1]; my $fout = $ARGV[2];` [download] It can be simplified into: `my ($in1, $in2, $fout) = @ARGV;` `open ONE, $in1; open TWO, $in2; open foutname, ">$fout";` [download] You need to check for errors; it's also safer to use three-argument form of `open()` (imagine someone specifying '/bin/rm -rf / \|' as file name). Thus, `open my $one, '<', $in1 \|\| die "$in1: $!\n"; open my $two, '<', $in2 \|\| die "$in2: $!\n"; open my $foutname, ">", $fout \|\| die "$fout: $!\n";` [download] Secondly, when you need to match something to something, think of a hash. Thus, `my %first; while (<$one>) { chomp; my @data = split /\t/; $first{$data[0]}=$data[1]; # fill the hash with the values from the f +irst file } close $one; while (<$two>) { chomp; # just search the hash for these strings exists $first{$_} && print $foutname $first{$_},'\n'; } close $two; close $foutname \|\| warn "$fout: $!\n';` [download] (the code is untested, feel free to ask about errors it gives) Sorry if my advice was wrong.	[reply] [d/l] [select]
Re: compare two files by column and return second (matching) column by abualiga (Scribe) on Aug 06, 2012 at 19:41 UTC
Hi there, give this a try and let me know how if you have questions. #!/usr/local/bin/perl use 5.016; #version >=5.12 turns on strictures by default use warnings; use autodie qw/ open close /; #saves typing die all the time die "Usage: $0 infile1 infile2\n" unless @ARGV == 2; my %count_cols; my @cols; while( <> ) { # perl will read/process one file in whole then move o +n to the next chomp; # get 1st col from each line using an array slice push @cols, ( split( /\t/, $_ ) )[0]; } for ( @cols ) { $count_cols{ $_ }++; # hash value = how many times does col name a +ppear } my @intersects; # store columns that appear in both files for ( keys %count_cols ) { if( $count_cols{ $_ } == 2 ) { push @intersects, $_; } } # open fileA again and extract fields in second column open IN, "<", "fileA"; while( my $line = <IN> ) { chomp $line; next if ! $line; for my $intersect ( @intersects ) { if( $intersect eq ( split( /\t/, $line ) )[0] ) { my $result = ( split( /\t/, $line ) )[1]; say $result; # prints with \n, supported in >5.10 } } } close IN; [download]	[reply] [d/l]


We don't bite newbies here... much
	PerlMonks