Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Trypsin digestion

by BioUs2003 (Initiate)
on Oct 22, 2015 at 14:40 UTC ( [id://1145647]=perlquestion: print w/replies, xml ) Need Help??

BioUs2003 has asked for the wisdom of the Perl Monks concerning the following question:

Hey monks

I have a biology-programming question:

I have to write a program that should be able to deal with 1 or more protein sequences, digest them into smaller peptide sequences and report these back one per line. Only trick is that when Trypsin sees a K or an R letter it should split the sequence after the letter, unless the next letter (amino acid) is Proline (P). The program should be able to read in FASTA-format protein sequences and return the individual peptides after digestion. Also, it should consider missed cleavages too.

OK, so I think that the expression for splitting and cutting with trypsin is:

= split /(/<=[KR])(?=[^P])/
, but i am lost about the other stuff!

Replies are listed 'Best First'.
Re: Trypsin digestion
by kennethk (Abbot) on Oct 22, 2015 at 16:05 UTC
    Welcome to the Monastery, BioUs2003. You'll note that while monks here try to be helpful, they do want well defined problems and will generally ask that you demonstrate that you've put work into solving the issue on your own. See How do I post a question effectively?.

    In particular, note that because you did not wrap your code snippet in <code> tags, your regular expression classes got turned into links.

    One challenge with your post is that the only well defined part of your problem is "How do I split after a K or R that is not followed by a P?" and that is that part that answered yourself. Of course, there's a typo in your suggested answer, and it should read

    my @arr = split /(?<=[KR])(?=[^P])/;
    presuming you do not whichwish to perform the deprecated implicit split.

    So, can you define in algorithmic terms what the following means?

    The program should be able to read in FASTA-format protein sequences and return the individual peptides after digestion. Also, it should consider missed cleavages too.

    Update: s/which/wish/ corrected as per below. It's been a long week.


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      >presuming you do not which to perform the deprecated implicit split.
      what do you mean?

        I think the line was intended to read "... presuming you do not wish to perform the deprecated implicit split." and refers to the deprecated implicit split to  @_ in scalar (or void) context that was removed in Perl version 5.12.


        Give a man a fish:  <%-{-{-{-<

Re: Trypsin digestion
by talexb (Chancellor) on Oct 22, 2015 at 15:58 UTC

    A quick search of MetaCPAN turns up a few modules that are happy to deal with FASTA files. The first one I see is Bio::SeqReader::Fasta, which may help you out.

    Have a read through these choices, and see if some of the examples help you out.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

      Thank you, I will check it out

        Also check out davido's personal node for a link to his "Perl Regular Expression Tester". This will allow you to easily see the effect of a given regex upon a given string, although as has been posted elsewhere, this seems to be the part of your problem you have the best handle on.


        Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1145647]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (7)
As of 2024-04-23 13:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found