Hello ray15, and welcome to the Monastery!
Here’s a solution using Text::CSV_XS:
File “1.csv”
fragment,id,index
accb,10,A
bbc,11,B
ccd,12,C
File “2.csv”
fragment,id,index
bbc,14,E
ccd,15,D
llk,11,B
kks,12,C
Script in file “main.pl”
#!perl
use strict;
use warnings;
use List::MoreUtils 'uniq';
use Text::CSV_XS;
my %files = (file1 => '1.csv', file2 => '2.csv');
my %hashes;
my $csv = Text::CSV_XS->new( { binary => 1 } );
for my $file (keys %files)
{
open(my $in, '<', $files{$file})
or die "Cannot open file '$files{$file}' for reading: $!";
<$in>; # Discard column headings
while (my $row = $csv->getline($in))
{
my $key = shift @$row;
$hashes{$file}{$key} = [ @$row ];
}
close $in
or die "Cannot close file '$files{$file}': $!";
}
separator_line();
print join("\t", qw(frag id1 file1 id2 file2)), "\n";
separator_line();
my @keys;
push @keys, keys %$_ for values %hashes;
@keys = uniq @keys;
for my $fragment (sort @keys)
{
my $f1 = exists $hashes{file1}{$fragment} ? 1 : 0;
my $f2 = exists $hashes{file2}{$fragment} ? 1 : 0;
printf "%s\t%s\t%s\t%s\t%s\n",
$fragment,
$f1 ? $hashes{file1}{$fragment}->[0] : '',
$f1,
$f2 ? $hashes{file2}{$fragment}->[0] : '',
$f2,
}
separator_line();
sub separator_line
{
print '-' x 37, "\n";
}
Output:
13:06 >perl main.pl
-------------------------------------
frag id1 file1 id2 file2
-------------------------------------
accb 10 1 0
bbc 11 1 14 1
ccd 12 1 15 1
kks 0 12 1
llk 0 11 1
-------------------------------------
13:07 >
Note: I do not try to access $hashes{file1}{$fragment}->[0] until I have confirmed that $hashes{file1}{$fragment} already exists in the hash. This is to avoid autovivification, which is a great Perl feature but is not wanted in this case. (See e.g. Uri Guttman’s tutorial for the gory details.)
Hope that helps,
|