Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Recognize DNA and amino acid sequence

by InfiniteSilence (Curate)
on Apr 23, 2011 at 17:22 UTC ( #900983=note: print w/ replies, xml ) Need Help??


in reply to Recognize DNA and amino acid sequence

Solution in three easy steps:

  • Uno: I think you are going to need to clearly describe what you mean by a sequence. For argument's sake I'll say you mean something like this, a sequence of capitalized letters (AGCTURYKMSWBDHVN) , one after another, followed by a single white space character (I borrowed this from here).
  • Dos: You may run into some problems using Perl with extremely large files. Try reading up more about this so you can divide up your problem (either the files themselves, rewriting some things in C and using XS, etc.). A really simple example using the file format from the previous link is here:
    use strict; my $seqNum = 0; my %sequences = (); open(H,qq|$ARGV[0]|) or die $!; while(<H>) { while (m/\b([AGCTURYKMSWBDHVN]+)\b/g) { $sequences{++$seqNum} = $1; } } close(H); for (sort {$a <=> $b} keys %sequences){print qq|$_\t$sequences{$_}\n|}
  • Tres: Here is the kicker. Just because these letters satisfy the regex doesn't mean that they necessarily are valid sequences. You will need to compare them against a powerful sequence database like BLAST. There are modules to perform searches written in Perl, but you should first become acquainted with a suite of tools specifically built for these kinds of problems called Bioperl.

Celebrate Intellectual Diversity


Comment on Re: Recognize DNA and amino acid sequence
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://900983]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2014-09-15 05:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (145 votes), past polls