ejbiers has asked for the wisdom of the Perl Monks concerning the following question:
I have two files. File A contains 2 columns (tab separated) and File B contains 1 column.
ex. File A
name1 xxxxx
name2 yyyyy
name3 zzzzz
name4 aaaaa
name5 bbbbb
File B
name3
name5
name1
What I'd like to get in an output file is:
zzzzz bbbbb xxxxx I am a novice at perl and have been trying to modify the following script (designed to count the number of occurrences) to meet my needs with no success. Could someone suggest how to modify this script? Thanks for any help you can give!
unless (@ARGV == 3) {
print "Use as follows: perl program.pl in1.file in2.file output.fi
+le\n";
die;
}
my $in1 = $ARGV[0];
my $in2 = $ARGV[1];
my $fout = $ARGV[2];
open ONE, $in1;
open TWO, $in2;
open foutname, ">$fout";
my %hash1;
my @hit;
while (<ONE>){
chomp;
my @hit = split(/\t/, $_);
#start them as "0" for "no duplicates"
$hash1{$hit[0]}=0;
}
close ONE;
my @col;
while (<TWO>){
chomp;
my @col = split(/\t/, $_);
#increment the counter if %hash1 has what we're looking for.
++$hash1{$col[0]} if(exists($hash1{$col[0]}));
}
my @dups = grep { $hash1{$_} > 0 } keys %hash1;
for my $k (@dups) {
print foutname "$k\t$hash1{$k}\n";
}
close TWO;
close foutname;
Re: compare two files by column and return second (matching) column
by Kenosis (Priest) on Aug 06, 2012 at 20:36 UTC
|
use Modern::Perl;
open my $fhA, '<', 'FileA.txt' or die $!;
my %hash = map { /(.+)\t(.+)/; $1 => $2 } grep /\S/, <$fhA>;
close $fhA;
open my $fhB, '<', 'FileB.txt' or die $!;
say for map { chomp; $hash{$_} } grep /\S/, <$fhB>;
close $fhB;
Output:
zzzzz
bbbbb
xxxxx
The first map uses a regex to grab the key/value pairs from FileA for placement into %hash. The second map uses the single entry from FileB to show that entry's associated value from FileA. Since there were blank lines in the files (from your data set), grep /\S/ was used to filter those.
Hope this helps! | [reply] [d/l] [select] |
|
| [reply] |
|
use Modern::Perl;
open my $fhA, '<', 'FileA.txt' or die $!;
my %hash = map { /(.+)\t(.+)/; $1 => $2 } grep /\S/, <$fhA>;
close $fhA;
open my $fhB, '<', 'FileB.txt' or die $!;
my @output = map { chomp; $hash{$_} } grep /\S/, <$fhB>;
close $fhB;
open my $fhO, '>', 'Output.txt' or die $!;
say $fhO $_ for @output;
close $fhO;
| [reply] [d/l] |
Re: compare two files by column and return second (matching) column
by nemesdani (Friar) on Aug 06, 2012 at 18:26 UTC
|
I have some general advice for you, but I'll wait till I'm sober. Till then:
use strict;
use warnings;
my %data;
open (DATA1, "<", "data1.txt") or die "cannot open file";
while (<DATA1>) {
my @line = split (/\t/, $_);
$data{$line[0]} = $line[1];
}
my @k = keys %data;
foreach my $key (@k) {
print "$key, $data{$key}\n";
}
close DATA1;
open (DATA2, "<", "data2.txt") or die "cannot open file";
print "found:\n";
while (<DATA2>) {
chomp;
if (exists $data{"$_"}) {print "$data{$_}\n";}
close DATA2;
This doesn't check duplicates, is too verbose, but I hope it'll help.
I'm too lazy to be proud of being impatient.
| [reply] [d/l] |
Re: compare two files by column and return second (matching) column
by BillKSmith (Monsignor) on Aug 06, 2012 at 19:13 UTC
|
Read the FAQ's to understand how this works. Refer perldoc -q duplicate | [reply] [d/l] |
Re: compare two files by column and return second (matching) column
by aitap (Curate) on Aug 06, 2012 at 19:32 UTC
|
Firstly, a couple of advices:
my $in1 = $ARGV[0];
my $in2 = $ARGV[1];
my $fout = $ARGV[2];
It can be simplified into: my ($in1, $in2, $fout) = @ARGV;
open ONE, $in1;
open TWO, $in2;
open foutname, ">$fout";
You need to check for errors; it's also safer to use three-argument form of open() (imagine someone specifying '/bin/rm -rf / |' as file name). Thus,
open my $one, '<', $in1 || die "$in1: $!\n";
open my $two, '<', $in2 || die "$in2: $!\n";
open my $foutname, ">", $fout || die "$fout: $!\n";
Secondly, when you need to match something to something, think of a hash. Thus,
my %first;
while (<$one>) {
chomp;
my @data = split /\t/;
$first{$data[0]}=$data[1]; # fill the hash with the values from the f
+irst file
}
close $one;
while (<$two>) {
chomp; # just search the hash for these strings
exists $first{$_} && print $foutname $first{$_},'\n';
}
close $two;
close $foutname || warn "$fout: $!\n';
(the code is untested, feel free to ask about errors it gives)
Sorry if my advice was wrong.
| [reply] [d/l] [select] |
Re: compare two files by column and return second (matching) column
by abualiga (Scribe) on Aug 06, 2012 at 19:41 UTC
|
Hi there,
give this a try and let me know how if you have questions.
#!/usr/local/bin/perl
use 5.016; #version >=5.12 turns on strictures by default
use warnings;
use autodie qw/ open close /; #saves typing die all the time
die "Usage: $0 infile1 infile2\n" unless @ARGV == 2;
my %count_cols;
my @cols;
while( <> ) { # perl will read/process one file in whole then move o
+n to the next
chomp;
# get 1st col from each line using an array slice
push @cols, ( split( /\t/, $_ ) )[0];
}
for ( @cols ) {
$count_cols{ $_ }++; # hash value = how many times does col name a
+ppear
}
my @intersects; # store columns that appear in both files
for ( keys %count_cols ) {
if( $count_cols{ $_ } == 2 ) {
push @intersects, $_;
}
}
# open fileA again and extract fields in second column
open IN, "<", "fileA";
while( my $line = <IN> ) {
chomp $line;
next if ! $line;
for my $intersect ( @intersects ) {
if( $intersect eq ( split( /\t/, $line ) )[0] ) {
my $result = ( split( /\t/, $line ) )[1];
say $result; # prints with \n, supported in >5.10
}
}
}
close IN;
| [reply] [d/l] |
|
|