Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
XP is just a number
 
PerlMonks  

compare two files by column and return second (matching) column

by ejbiers (Initiate)
on Aug 06, 2012 at 17:22 UTC ( #985775=perlquestion: print w/ replies, xml ) Need Help??
ejbiers has asked for the wisdom of the Perl Monks concerning the following question:

I have two files. File A contains 2 columns (tab separated) and File B contains 1 column.

ex. File A

name1 xxxxx

name2 yyyyy

name3 zzzzz

name4 aaaaa

name5 bbbbb

File B

name3

name5

name1

What I'd like to get in an output file is:

zzzzz

bbbbb

xxxxx

I am a novice at perl and have been trying to modify the following script (designed to count the number of occurrences) to meet my needs with no success. Could someone suggest how to modify this script? Thanks for any help you can give!

unless (@ARGV == 3) { print "Use as follows: perl program.pl in1.file in2.file output.fi +le\n"; die; } my $in1 = $ARGV[0]; my $in2 = $ARGV[1]; my $fout = $ARGV[2]; open ONE, $in1; open TWO, $in2; open foutname, ">$fout"; my %hash1; my @hit; while (<ONE>){ chomp; my @hit = split(/\t/, $_); #start them as "0" for "no duplicates" $hash1{$hit[0]}=0; } close ONE; my @col; while (<TWO>){ chomp; my @col = split(/\t/, $_); #increment the counter if %hash1 has what we're looking for. ++$hash1{$col[0]} if(exists($hash1{$col[0]})); } my @dups = grep { $hash1{$_} > 0 } keys %hash1; for my $k (@dups) { print foutname "$k\t$hash1{$k}\n"; } close TWO; close foutname;

Comment on compare two files by column and return second (matching) column
Download Code
Re: compare two files by column and return second (matching) column
by nemesdani (Friar) on Aug 06, 2012 at 18:26 UTC
    I have some general advice for you, but I'll wait till I'm sober. Till then:

    use strict; use warnings; my %data; open (DATA1, "<", "data1.txt") or die "cannot open file"; while (<DATA1>) { my @line = split (/\t/, $_); $data{$line[0]} = $line[1]; } my @k = keys %data; foreach my $key (@k) { print "$key, $data{$key}\n"; } close DATA1; open (DATA2, "<", "data2.txt") or die "cannot open file"; print "found:\n"; while (<DATA2>) { chomp; if (exists $data{"$_"}) {print "$data{$_}\n";} close DATA2;
    This doesn't check duplicates, is too verbose, but I hope it'll help.

    I'm too lazy to be proud of being impatient.
Re: compare two files by column and return second (matching) column
by BillKSmith (Hermit) on Aug 06, 2012 at 19:13 UTC
    Read the FAQ's to understand how this works. Refer perldoc -q duplicate
Re: compare two files by column and return second (matching) column
by aitap (Chaplain) on Aug 06, 2012 at 19:32 UTC
    Firstly, a couple of advices:
    my $in1 = $ARGV[0]; my $in2 = $ARGV[1]; my $fout = $ARGV[2];
    It can be simplified into: my ($in1, $in2, $fout) = @ARGV;
    open ONE, $in1; open TWO, $in2; open foutname, ">$fout";
    You need to check for errors; it's also safer to use three-argument form of open() (imagine someone specifying '/bin/rm -rf / |' as file name). Thus,
    open my $one, '<', $in1 || die "$in1: $!\n"; open my $two, '<', $in2 || die "$in2: $!\n"; open my $foutname, ">", $fout || die "$fout: $!\n";
    Secondly, when you need to match something to something, think of a hash. Thus,
    my %first; while (<$one>) { chomp; my @data = split /\t/; $first{$data[0]}=$data[1]; # fill the hash with the values from the f +irst file } close $one; while (<$two>) { chomp; # just search the hash for these strings exists $first{$_} && print $foutname $first{$_},'\n'; } close $two; close $foutname || warn "$fout: $!\n';
    (the code is untested, feel free to ask about errors it gives)
    Sorry if my advice was wrong.
Re: compare two files by column and return second (matching) column
by abualiga (Scribe) on Aug 06, 2012 at 19:41 UTC

    Hi there, give this a try and let me know how if you have questions.

    #!/usr/local/bin/perl use 5.016; #version >=5.12 turns on strictures by default use warnings; use autodie qw/ open close /; #saves typing die all the time die "Usage: $0 infile1 infile2\n" unless @ARGV == 2; my %count_cols; my @cols; while( <> ) { # perl will read/process one file in whole then move o +n to the next chomp; # get 1st col from each line using an array slice push @cols, ( split( /\t/, $_ ) )[0]; } for ( @cols ) { $count_cols{ $_ }++; # hash value = how many times does col name a +ppear } my @intersects; # store columns that appear in both files for ( keys %count_cols ) { if( $count_cols{ $_ } == 2 ) { push @intersects, $_; } } # open fileA again and extract fields in second column open IN, "<", "fileA"; while( my $line = <IN> ) { chomp $line; next if ! $line; for my $intersect ( @intersects ) { if( $intersect eq ( split( /\t/, $line ) )[0] ) { my $result = ( split( /\t/, $line ) )[1]; say $result; # prints with \n, supported in >5.10 } } } close IN;
Re: compare two files by column and return second (matching) column
by Kenosis (Priest) on Aug 06, 2012 at 20:36 UTC

    Excellent suggestions have been given. Here's another option:

    use Modern::Perl; open my $fhA, '<', 'FileA.txt' or die $!; my %hash = map { /(.+)\t(.+)/; $1 => $2 } grep /\S/, <$fhA>; close $fhA; open my $fhB, '<', 'FileB.txt' or die $!; say for map { chomp; $hash{$_} } grep /\S/, <$fhB>; close $fhB;

    Output:

    zzzzz bbbbb xxxxx

    The first map uses a regex to grab the key/value pairs from FileA for placement into %hash. The second map uses the single entry from FileB to show that entry's associated value from FileA. Since there were blank lines in the files (from your data set), grep /\S/ was used to filter those.

    Hope this helps!

      This is perfect! Thanks! However, is there a way to print the results into an output file?

        You're most welcome, ejbiers! Yes, the script below includes printing the results to a file:

        use Modern::Perl; open my $fhA, '<', 'FileA.txt' or die $!; my %hash = map { /(.+)\t(.+)/; $1 => $2 } grep /\S/, <$fhA>; close $fhA; open my $fhB, '<', 'FileB.txt' or die $!; my @output = map { chomp; $hash{$_} } grep /\S/, <$fhB>; close $fhB; open my $fhO, '>', 'Output.txt' or die $!; say $fhO $_ for @output; close $fhO;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://985775]
Approved by lidden
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2014-04-19 10:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (480 votes), past polls