in reply to
Matching elements in two arrays and printing the element next to the match.
Caveats:
I am not certain of your file formats. Here are my assumptions:
- For FILE A, it is unclear if you have multiple ID strings on one line, or one per line over multiple lines. I'll make this solution work with either.
- For FILE B, are your lines "key <whitespace> letters" or
"key <newline> letters <newline>", ie key and values on separate lines?
Solution:
Perl hashes are very robust and often a great solution for simple to medium complexity problems. For this solution I'll read all the entries from the first file, parse out the IDs, and insert each ID into a hash. Then we will parse each entry in fileB and check if that ID is in the hash we built while walking fileA. In the case of a match, we print the ID and LETTERS joined with a < tab %gt; character.
With this solution, we look at each line of fileA and fileB
exactly once, and we use a hash lookup on IDs which is fast. This reduces our complexity from O(n^2)+ from the previous solution to something closer to O(n log n), possibly close to O(n) if we're lucky with our ID hashing.
The Code
#!/usr/bin/perl
use warnings; use strict;
# open filea and parse all id strings.
# Add id strings as keys to %wanted array.
my %wanted;
{
open my $file, '<', "filea"
|| die "failed to open filea : $!";
while( <$file>)
{
chomp;
@ids = split( /\s+/, $_);
$wanted{ $_ }++ for @ids;
}
close $file;
}
#read fileb, parse lines of the form "id <whitespace> letters"
#and print lines that match the id strings from filea.
{
open my $file, '<', 'fileb'
|| die "failed to open fileb : $!";
while (<$file>)
{
chomp;
my ($id, $letters) = split( /\s+/, $_);
print "$id\t$letters\n" if $wanted{$id};
}
}
#OR
#read fileb, parse lines of the form "id <newline> letters"
#and print lines that match the id strings from filea.
{
open my $file, '<', 'fileb'
|| die "failed to open fileb : $!";
while (<$file>)
{
my $id = $_;
my $letters = <$file>;
chomp($id);
chomp($letters);
print "$id\t$letters\n" if $wanted{$id};
}
}
__END__
FileA:
1DWK 2RFK
4ERH
FileB:
1DWK HRSDKKDAHJKLSDLDLLJDGHDFJJE
4ERH DFSKFHADFSBVHFWIHFWJBFS
2RFK DADUHRQWERKBNJAIJDLAJDKAKDNAKDJKSADJKAHDJASHRWEUB
FileB (alternate):
1DWK
HRSDKKDAHJKLSDLDLLJDGHDFJJE
4ERH
DFSKFHADFSBVHFWIHFWJBFS
2RFK
DADUHRQWERKBNJAIJDLAJDKAKDNAKDJKSADJKAHDJASHRWEUB