Hey monks
I have a biology-programming question:
I have to write a program that should be able to deal with 1 or more protein sequences, digest them into smaller peptide sequences and report these back one per line. Only trick is that when Trypsin sees a K or an R letter it should split the sequence after the letter, unless the next letter (amino acid) is Proline (P). The program should be able to read in FASTA-format protein sequences and return the individual peptides after digestion.
Also, it should consider missed cleavages too.
OK, so I think that the expression for splitting and cutting with trypsin is:
= split /(/<=[KR])(?=[^P])/
, but i am lost about the other stuff!