Trypsin digestion

BioUs2003 has asked for the wisdom of the Perl Monks concerning the following question:

Hey monks

I have a biology-programming question:

I have to write a program that should be able to deal with 1 or more protein sequences, digest them into smaller peptide sequences and report these back one per line. Only trick is that when Trypsin sees a K or an R letter it should split the sequence after the letter, unless the next letter (amino acid) is Proline (P). The program should be able to read in FASTA-format protein sequences and return the individual peptides after digestion. Also, it should consider missed cleavages too.

OK, so I think that the expression for splitting and cutting with trypsin is:

= split /(/<=[KR])(?=[^P])/
[download]

, but i am lost about the other stuff!

Comment on Trypsin digestion Download Code

Replies are listed 'Best First'.
Re: Trypsin digestion by kennethk (Abbot) on Oct 22, 2015 at 16:05 UTC
Welcome to the Monastery, BioUs2003. You'll note that while monks here try to be helpful, they do want well defined problems and will generally ask that you demonstrate that you've put work into solving the issue on your own. See How do I post a question effectively?. In particular, note that because you did not wrap your code snippet in `<code>` tags, your regular expression classes got turned into links. One challenge with your post is that the only well defined part of your problem is "How do I split after a K or R that is not followed by a P?" and that is that part that answered yourself. Of course, there's a typo in your suggested answer, and it should read `my @arr = split /(?<=[KR])(?=[^P])/;` [download] presuming you do not ~~which~~wish to perform the deprecated implicit split. So, can you define in algorithmic terms what the following means? The program should be able to read in FASTA-format protein sequences and return the individual peptides after digestion. Also, it should consider missed cleavages too. Update: s/which/wish/ corrected as per below. It's been a long week. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l] [select]
Re^2: Trypsin digestion by Anonymous Monk on Oct 23, 2015 at 05:33 UTC
>presuming you do not which to perform the deprecated implicit split. what do you mean?	[reply]
Re^3: Trypsin digestion by AnomalousMonk (Archbishop) on Oct 23, 2015 at 06:27 UTC
I think the line was intended to read "... presuming you do not wish to perform the deprecated implicit split." and refers to the deprecated implicit split to `@_` in scalar (or void) context that was removed in Perl version 5.12. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re: Trypsin digestion by talexb (Chancellor) on Oct 22, 2015 at 15:58 UTC
A quick search of MetaCPAN turns up a few modules that are happy to deal with FASTA files. The first one I see is Bio::SeqReader::Fasta, which may help you out. Have a read through these choices, and see if some of the examples help you out. Alex / talexb / Toronto Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.	[reply]
Re^2: Trypsin digestion by BioUs2003 (Initiate) on Oct 22, 2015 at 17:01 UTC
Thank you, I will check it out	[reply]
Re^3: Trypsin digestion by AnomalousMonk (Archbishop) on Oct 22, 2015 at 21:11 UTC
Also check out davido's personal node for a link to his "Perl Regular Expression Tester". This will allow you to easily see the effect of a given regex upon a given string, although as has been posted elsewhere, this seems to be the part of your problem you have the best handle on. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l]


P is for Practical
	PerlMonks