I have multiple fastq files in the following format. I want to print the total count if the second line i.e the sequence matches in all files.
R1.txt
@NS500278:42:HC7M3AFXX:3:21604:26458:18476 2:N:0:AGTGGTCA
AAAAAAAAACAGATATTTGCACTAGGCATTATAAATAACATCAATTAAGTAAAAAAATTA
+
AAAAAEEEEAEEEEEEEEEE/AEEEEEEEEEEEE
1:R1.txt
R2.txt
@NS500278:42:HC7M3AFXX:3:21604:26458:18476 2:N:0:AGTGGTCA
AAAAAAAAACAGATATTTGCACTAGGCATTATAAATAACATCAATTAAGTAAAAAAATTA
+
AAAAAEEEEAEEEEEEEEEE
1:R2.txt
The output I want is:
output
@NS500278:42:HC7M3AFXX:3:21604:26458:18476 2:N:0:AGTGGTCA
AAAAAAAAACAGATATTTGCACTAGGCATTATAAATAACATCAATTAAGTAAAAAAATTA
+
AAAAAEEEEAEEEEEEEEEE/AEEEEEEEEEEEE
1:R1.txt 1:R2.txt count:2
My code is:
#!/usr/bin/env perl
use strict;
use warnings;
no warnings qw( numeric );
my %seen;
$/ = "";
while (<>) {
chomp;
my ($key, $value) = split ('\t', $_);
my @lines = split /\n/, $key;
my $key1 = $lines[1];
$seen{$key1} //= [ $key ];
push (@{$seen{$key1}}, $value);
}
foreach my $key1 ( sort keys %seen ) {
my $tot = 0;
my $file_count = @ARGV;
for my $val ( @{$seen{$key1}} ) {
$tot += ( split /:/, $val )[0];
}
if ( @{ $seen{$key1} } >= $file_count) {
print join( "\t", @{$seen{$key1}});
print "\tcount:". $tot."\n\n";
}
}
This is working well with some files but when I compare more files it hangs. I think it is because of memory issue. Without using any modules I want to modify this script so that it can work with any number of files.