Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

orf subsequences

by odegbon (Initiate)
on Dec 04, 2008 at 06:23 UTC ( #727877=perlquestion: print w/ replies, xml ) Need Help??
odegbon has asked for the wisdom of the Perl Monks concerning the following question:

Hello all;

I have a DNA sequence and I want to find all start codons(ATG,GTG,) and stop codons. I want to translate all possible subsequences betwen start and stop codons to their corresponding protein sequences.This should be on the 1st frame only.

For instance; $Dna = "AAAATGGGGTAAGTGAACGGGTAA" should return the corresponding proteins of "ATGGGGTAA" and "GTGAACGGGTAA" but should work also for very long sequences.

I have tried to write something like this but it doesnt seem to work:

#!/usr/bin/perl use strict; use strict; use warnings; use Bio::SearchIO; use Bio::Seq; use Bio::SeqIO; my($seqio); my($genio); if(-e $ARGV[0]) { $seqio = Bio::SeqIO->new( '-format' => 'fasta' , -file => $ARG +V[0]); } else { die "$ARGV[0] not found\n"; } my($seq) = ""; my($len) = ""; my($head) = ""; while ( my $seqobj = $seqio->next_seq ) { $len = $seqobj->length(); $head = $seqobj->id(); $seq = $seqobj->seq(); chomp($seq); } my(%genetic_code) = ( 'TCA' => 'S', # Serine 'TCC' => 'S', # Serine 'TCG' => 'S', # Serine 'TCT' => 'S', # Serine 'TTC' => 'F', # Phenylalanine 'TTT' => 'F', # Phenylalanine 'TTA' => 'L', # Leucine 'TTG' => 'L', # Leucine 'TAC' => 'Y', # Tyrosine 'TAT' => 'Y', # Tyrosine 'TAA' => '_', # Stop 'TAG' => '_', # Stop 'TGC' => 'C', # Cysteine 'TGT' => 'C', # Cysteine 'TGA' => '_', # Stop 'TGG' => 'W', # Tryptophan 'CTA' => 'L', # Leucine 'CTC' => 'L', # Leucine 'CTG' => 'L', # Leucine 'CTT' => 'L', # Leucine 'CCA' => 'P', # Proline 'CCC' => 'P', # Proline 'CCG' => 'P', # Proline 'CCT' => 'P', # Proline 'CAC' => 'H', # Histidine 'CAT' => 'H', # Histidine 'CAA' => 'Q', # Glutamine 'CAG' => 'Q', # Glutamine 'CGA' => 'R', # Arginine 'CGC' => 'R', # Arginine 'CGG' => 'R', # Arginine 'CGT' => 'R', # Arginine 'ATA' => 'I', # Isoleucine 'ATC' => 'I', # Isoleucine 'ATT' => 'I', # Isoleucine 'ATG' => 'M', # Methionine 'ACA' => 'T', # Threonine 'ACC' => 'T', # Threonine 'ACG' => 'T', # Threonine 'ACT' => 'T', # Threonine 'AAC' => 'N', # Asparagine 'AAT' => 'N', # Asparagine 'AAA' => 'K', # Lysine 'AAG' => 'K', # Lysine 'AGC' => 'S', # Serine 'AGT' => 'S', # Serine 'AGA' => 'R', # Arginine 'AGG' => 'R', # Arginine 'GTA' => 'V', # Valine 'GTC' => 'V', # Valine 'GTG' => 'V', # Valine 'GTT' => 'V', # Valine 'GCA' => 'A', # Alanine 'GCC' => 'A', # Alanine 'GCG' => 'A', # Alanine 'GCT' => 'A', # Alanine 'GAC' => 'D', # Aspartic Acid 'GAT' => 'D', # Aspartic Acid 'GAA' => 'E', # Glutamic Acid 'GAG' => 'E', # Glutamic Acid 'GGA' => 'G', # Glycine 'GGC' => 'G', # Glycine 'GGG' => 'G', # Glycine 'GGT' => 'G', # Glycine ); my @startsRF1 =(); my @startsRF2 =(); my @startsRF3 =(); my @stopsRF1 = (); my @stopsRF2 = (); my @stopsRF3 = (); my @arrayOfORFs = (); my @arrayOfTranslations = (); my $joinedAminoAcids = (); while ($seq =~ m/ATG|TTG|CTG|ATT|CTA|GTG|ATT/gi){ my $matchPosition = pos($seq) - 3; if (($matchPosition % 3) == 0) { push (@startsRF1, $matchPosition); } while ($seq =~ m/TAG|TAA|TGA/gi){ my $matchPosition = pos($seq); if (($matchPosition % 3) == 0) { push (@stopsRF1, $matchPosition); } my $codonRange = ""; my $startPosition = 0; my $stopPosition = 0; @startsRF1 = reverse(@startsRF1); @stopsRF1 = reverse(@stopsRF1); while (scalar(@startsRF1) > 0) { $codonRange = ""; $startPosition = pop(@startsRF1); if ($startPosition < $stopPosition) { next; } my $ORFseq = ""; while (scalar(@stopsRF1) > 0) { $stopPosition = pop(@stopsRF1); if ($stopPosition > $startPosition) { my $difF = $stopPosition - $startPosition; $ORFseq = substr($seq, $startPosition,(length($seq)-$difF)); push (@arrayOfORFs, $ORFseq); } foreach $ORFseq (@arrayOfORFs){ my @growingProtein = (); for (my $i = 0; $i <= (length($ORFseq) - 3); $i = $i + 3) { my $codon = substr($ORFseq, $i, 3); if (exists( $genetic_code{$codon} )){ push (@growingProtein, $genetic_code{$codon}); } else { push (@growingProtein, "X"); } } my $joinedAminoAcids = join("",@growingProtein); push (@arrayOfTranslations, $joinedAminoAcids); } foreach(@arrayOfTranslations) { print $_, "\n"; } } } } }

I NEED HELP URGENTLY PLEASE.

Regards,
Emman

Comment on orf subsequences
Select or Download Code
Re: orf subsequences
by ForgotPasswordAgain (Deacon) on Dec 04, 2008 at 09:36 UTC

    Please read Writeup Formatting Tips, in particular the part about <code> tags.

    Just for info, putting "I NEED HELP URGENTLY PLEASE" probably has the opposite effect of what you want.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://727877]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (10)
As of 2014-07-28 10:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (196 votes), past polls