The 'g' switch on the three regular expressions shouldn't be there.
If $aa =~/[DNEQ]/ig matches, then the following match, $aa=~/[KRH]/ig, fails (and the pos is reset so that the last expression, $aa=~/[DNEQKRH]/ig will match.
However, if $aa=~/[KRH]/ig matches, then pos will be '1' and the following match, $aa=~/[DNEQKRH]/ig will fail because it will attempt to match beginning at pos 1 instead of pos 0.
You can see this in the following code snippet.
#!/usr/bin/perl
use strict;
use warnings;
use 5.014;
my @prot = qw/ D K /;
my ($acid_cnt, $base_cnt, $neutral_cnt);
while(@prot)
{
my $aa = shift (@prot);
if($aa =~/[DNEQ]/gi)
{
++$acid_cnt;
}
if($aa=~/[KRH]/gi)
{
++$base_cnt;
}
say "$aa pos: ", pos($aa) // 'pos reset';
if($aa=~/[DNEQKRH]/gi)
{
++$neutral_cnt;
}
}
This prints
C:\Old_Data\perlp>perl t7.pl
aa D pos: pos reset
aa K pos: 1
This, in effect, counts the acid base in the neutral count but not the base.
He probably wants the neutral count to be other than the acid or base, in which case, that regular expression should be, $aa=~/[^DNEQKRH]/i, negating the class.
Without the 'g' switch, and negating the neutral class, the output would look like:
C:\Old_Data\perlp>perl t9.pl test.fas
Name: >DROME_HH_Q02937
Number of acidic amino acids:33
Number of basic amino acids:35
Number of neutral amino acids:136
Name: >DROME_HH_Q02938
Number of acidic amino acids:18
Number of basic amino acids:17
Number of neutral amino acids:69
Name: >DROTME_HH_Q02936
Number of acidic amino acids:14
Number of basic amino acids:18
Number of neutral amino acids:67
Chris |