It looks to me like you're printing inside of the tightest loop when you should be moving that print statement out to the same scope at which the counter variables are getting reset. You haven't really specified your problem clearly, but printing an accumulator several times before resetting it is sort of a red flag to me. Another red flag is that you're opening your output file in append mode, so each time you run the program you're just making the file bigger. Better to open it in "clobber and output" mode, unless you really have a need to just append.
I've tried to rework your program in a way that is clearer to read, easier to maintain, and more idiomatic to a Perl programmer. I also changed your output filehandle's name from "IN" to "$OUT". When you open a file for output, you are just asking for confusion by calling its filehandle "IN". Here's some example code.
use strict;
use warnings;
use bio::seqIO;
my @tracked_proteins = qw( A C D E F G H I K L M N P Q R S T V W Y );
# Protein output file:
my $output_file = 'countaa';
open my $OUT, '>', $output_file
or die "Can't open file $output_file for output.\n$!";
my $proteinio =
Bio::SeqIO->new( -file => "ec 1.1.1.fasta", -format => 'fasta' );
while ( my $seq = $proteinio->next_seq ) {
foreach my $protein_seq ( $seq->seq ) {
my %counts;
@counts{@tracked_proteins} = ();
my @protein = split //, $protein_seq;
foreach my $protein_alpha (@protein) {
next unless exists $counts{$protein_alpha};
$counts{$protein_alpha}++;
}
}
print $OUT "$_ " for @counts{@tracked_proteins};
print $OUT "\n";
}
close $OUT or die $!;
The code could be simplified further if we knew that the only characters found in the input data stream are in the set of [ACDEFGHIKLMNPQRSTUVWY]. Since I know about as much about your problem domain as you do mine, I took the precaution of skipping characters that don't match that specific set.
It's also possible that your accumulator should be reset at a broader scope, and that your print could be moved out one scope level too, but I made it a point to follow your lead on that one. This code is clearly written enough (in my opinion) that you should be able to change scoping of the accumulator and the print easily enough. You may find, as I do, that whenever a unit of functionality can fit on one screen within your editor, the problem becomes easier to visualize.
Update: Fixed a print formatting issue... and it helps if a counter script actually increments a counter. ;)
|