I have some files in a directory with .fq extension such as NC_002755.fq NC_009525.fq NC_009565.fq NC_012943.fq NC_016768.fq NC_016934.fq NC_017026.fq NC_017522.fq NC_017523.fq NC_017524.fq NC_017528.fq NC_018078.fq NC_018143.fq NC_020089.fq NC_020559.fq NC_021054.fq NC_021192.fq NC_021193.fq NC_021194.fq NC_021251.fq NC_021740.fq NC_022350.fq.
I have created a file p.txt in the following format:
(3R)-hydroxyacyl-ACP dehydratase subunit HadA NC_021192.1
(3R)-hydroxyacyl-ACP dehydratase subunit HadB NC_017026.1
(dimethylallyl)adenosine tRNA methylthiotransferase NC_009565.1
+NC_002755.2 NC_009525.1 NC_016934.1
1,4-alpha-glucan branching protein NC_021192.1 NC_017522.1 NC
+_016934.1 NC_018078.1 NC_020089.1 NC_002755.2 NC_017524.1
+ NC_009565.1 NC_012943.1 NC_017523.1 NC_021740.1
1,4-dihydroxy-2-naphthoate octaprenyltransferase NC_016934.1 NC_
+017026.1 NC_009525.1
1,4-dihydroxy-2-naphthoyl-CoA synthase NC_018143.2 NC_009565.1
+ NC_002755.2 NC_012943.1 NC_017523.1
where NC_.... is the name of files in the directory with .fq extension.
I want to create a binary matrix which has all the filenames in horizontal and genes in vertical and print 1 if filename matches else print 0.
My script is :
open (FH, "p.txt");
while ($seq=<FH>)
{
@seq = split /\t/, $seq;
print @seq[0]."\t";
opendir(DIR, ".");
@files = grep(/\.fq$/,readdir(DIR));
closedir(DIR);
foreach $file (@files) {
@file1 = split /\./, $file;
$file2 = $file1[0];
$size = @files;
$file3 = $file2."\t".$file3;}
for ($i = 1; $i <= $size; $i++)
{ @seq1 = split /\./, @seq[$i];
chomp @seq1[0];
if (@seq1[0] eq $file2)
{print "1"."\t";}
else
{print "0"."\t";}
}
print "\n";
}
but this gives me incorrect output of 0's and 1's.