well i ran the code and it returned this:
E7F888 Name=arid5b ;PF01388
which is correct. Well I selected the number cause it was in both files ( I manually ctrl-f ed the notepad file.
Well a lot of lines are missing from the final output. The $activ is 512 kb large and the out put is 3kb. And also @activline is missing alot when I did print @activline;. I randomly selected E7F888 from $activin and checked to make sure it was in $uniprot, so it may be another offending line of data. If I did this:
print $activ{E7F888},"\n";
for ( read_file $uniprot ) {
next unless /(E7F888)\s+.+=([^\s]+)/;
say;
push @activline, "$1 | $2 | $activ{$1} \n" if $activ{$1};
}
print @activline;
}
print @activline;
I got this:
PF01388
E7F888 Name=arid5b ;PF01388
E7F888 | arid5b | PF01388
which showed that it was in %activ and that it was found in the file. But when I run through the normal code, it is not in @activ anymore.
UPDATE,
I ran the following code to check something, and not all the data is printed, I.E. A lot of lines are not pushed into the array.
#!/usr/bin/perl
use Modern::Perl;
use File::Slurp qw/read_file write_file/;
my $uniprot = 'uniprot-sfinal.txt';
my $activin = 'Activator-PFAM.txt';
my %activ = map { s/\.\d+//g; /(.+)\s+\|\s+(.+)/ and $1=>$2; } grep/\|
+\s+\S+/,read_file $activin;
for ( read_file $uniprot ) {
next unless /(.{6})\s+.+=([^\s]+)/;
print $1,"\n"
if $activ{$1};
}
print "done";
I am sure that there are more lines that are supposed to be printed. E.G.
$uniprot:
Q801F8 Q90XZ5; Q90XZ8;Name=dmrt1
B7ZS42 A4PBN7; B7ZS44;Name=dmrt1-b ;PF00751;PF12374
Q157S1 Name=Crtc1 Synonyms=Mect1,
$activ:
Q801F8 | PF00751.13 PF12374.3 PF12374.3
B7ZS42 | PF00751.13 PF12374.3 PF12374.3
Q157S1 | PF12886.2 PF12885.2 PF12884.2
I shall try to reverse the order, I.E, make a hash out of $uniprot instead of $activ and run it again. |