The output from your code shows some problems:
The peptide is DAAAAATTLTTTAMTTTTTTC
The peptide is MMFRPPPPPGGGGGGGGGGGG
The peptide is ALTAMCMNVWEITYH
The peptide is GSDVN
The peptide is
The peptide is ASFAQPPPQPPPPLLAIKPASDASD
The
K or
R terminating split codon (if that's the proper term) is being incorrectly removed from the output peptides. (At least, I think this is incorrect.
TamaDP doesn't show desired output, but seems satisfied with output examples given in various replies in this thread that include these codons.) So I assume
GSDVN should really be
GSDVNR and the "null" sequence following it should really be the single-codon sequence
R. This is all down to the incorrect definition of the
s/// match pattern; take a look at some other replies in this thread for what I feel are more correct
s/// patterns.
In an unrelated note, the regex in the condition expression of the
if ($protein =~ m/[K(?!P)|R(?!P)]/g) { ... }
block isn't doing what I think you think it's doing. The [K(?!P)|R(?!P)] character class is exactly equivalent to the [KPR()?!|] class; metacharacters (alternations, groupings, etc.) have no meaning in a character class, so ()?!| are just literal characters (and repeated characters have no effect whatsoever). Also, the /g modifier in the m//g match is useless in the boolean context of a conditional, although it does no harm (except to burn a few more innocent computrons). Again, all this doesn't affect the basic problem with the code, which stems from the incorrect s/// match.
I use Data::Dumper all the time because I've been fooled by my data too many times.
Yea and amen brother, yea and amen.
Give a man a fish: <%-{-{-{-<