Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

translating multiple DNA sequence to protein sequence

by yuvraj_ghaly (Sexton)
on Aug 19, 2013 at 04:32 UTC ( #1049968=perlquestion: print w/replies, xml ) Need Help??
yuvraj_ghaly has asked for the wisdom of the Perl Monks concerning the following question:

I want to translate DNA sequences present in multi-fasta file. I have written a code but it will translate only one sequence in a file. So the question is: How would I translate DNA sequences present in multi-fasta file into their respective protein sequences????

print "ENTER THE FILENAME OF THE DNA SEQUENCE:= "; $DNAfilename = <STDIN>; chomp $DNAfilename; unless ( open(DNAFILE, $DNAfilename) ) { print "Cannot open file \"$DNAfilename\"\n\n"; } @DNA = <DNAFILE>; close DNAFILE; $DNA = join( '', @DNA); print " \nThe original DNA file is:\n$DNA \n"; $DNA =~ s/\s//g; my $protein=''; my $codon; for(my $i=0;$i<(length($DNA)-2);$i+=3) { $codon=substr($DNA,$i,3); $protein.=&codon2aa($codon); } print "The translated protein is :\n$protein\n"; <STDIN>; sub codon2aa{ my($codon)=@_; $codon=uc $codon; my(%g)=('TCA'=>'S','TCC'=>'S','TCG'=>'S','TCT'=>'S','TTC'=>'F','TTT'=> +'F','TTA'=>'L','TTG'=>'L','TAC'=>'Y','TAT'=>'Y','TAA'=>'_','TAG'=>'_' +,'TGC'=>'C','TGT'=>'C','TGA'=>'_','TGG'=>'W','CTA'=>'L','CTC'=>'L','C +TG'=>'L','CTT'=>'L','CCA'=>'P','CCC'=>'P','CCG'=>'P','CCT'=>'P','CAC' +=>'H','CAT'=>'H','CAA'=>'Q','CAG'=>'Q','CGA'=>'R','CGC'=>'R','CGG'=>' +R','CGT'=>'R','ATA'=>'I','ATC'=>'I','ATT'=>'I','ATG'=>'M','ACA'=>'T', +'ACC'=>'T','ACG'=>'T','ACT'=>'T','AAC'=>'N','AAT'=>'N','AAA'=>'K','AA +G'=>'K','AGC'=>'S','AGT'=>'S','AGA'=>'R','AGG'=>'R','GTA'=>'V','GTC'= +>'V','GTG'=>'V','GTT'=>'V','GCA'=>'A','GCC'=>'A','GCG'=>'A','GCT'=>'A +','GAC'=>'D','GAT'=>'D','GAA'=>'E','GAG'=>'E','GGA'=>'G','GGC'=>'G',' +GGG'=>'G','GGT'=>'G'); if(exists $g{$codon}) { return $g{$codon}; } else { print STDERR "Bad codon \"$codon\"!!\n"; exit; } }

Replies are listed 'Best First'.
Re: translating multiple DNA sequence to protein sequence
by polypompholyx (Chaplain) on Aug 19, 2013 at 11:16 UTC
    If you're planning on using Perl for bioinformatics, you might be better off installing BioPerl rather than hand-rolling FASTA parsers and translation codon tables.
    use Bio::SeqIO; my $sequences = Bio::SeqIO->new( -file => "sequence.fasta", -format => "fasta", ); while ( my $dna = $sequences->next_seq ){ my $protein = $dna->translate( -codontable_id => 1, # standard genetic code -frame => 0, #reading-frame offset 0 ); print $dna->display_id, "\n"; print $protein->seq, "\n\n"; }
    Having said that, installing BioPerl (1.6.901) on Windows seems to be more difficult than I was expecting: I had to resort to force with Strawberry and CPAN, having simply given up trying to get it to install with ActivePerl and PPM.
Re: translating multiple DNA sequence to protein sequence
by jwkrahn (Monsignor) on Aug 19, 2013 at 08:44 UTC
    #!/usr/bin/perl use warnings; use strict; print 'ENTER THE FILENAME OF THE DNA SEQUENCE:= '; chomp( my $DNAfilename = <STDIN> ); open my $DNAFILE, $DNAfilename or die qq[Cannot open file "$DNAfilenam +e" because: $!]; local $/; ( my $DNA = uc <$DNAFILE> ) =~ tr/ACGT//cd; print "\nThe original DNA file is:\n$DNA\n"; my %codon2aa = qw( TCA S TCC S TCG S TCT S TTC F TTT F TTA L TTG L TAC Y TAT Y TAA _ TAG _ TGC C TGT C TGA _ TGG W CTA L CTC L CTG L CTT L CCA P CCC P CCG P CCT P CAC H CAT H CAA Q CAG Q CGA R CGC R CGG R CGT R ATA I ATC I ATT I ATG M ACA T ACC T ACG T ACT T AAC N AAT N AAA K AAG K AGC S AGT S AGA R AGG R GTA V GTC V GTG V GTT V GCA A GCC A GCG A GCT A GAC D GAT D GAA E GAG E GGA G GGC G GGG G GGT G ); my $protein = ''; while ( $DNA =~ /(...)/g ) { exists $codon2aa{ $1 } or die qq[Bad codon "$1"!!\n]; $protein .= $codon2aa{ $1 }; } print "The translated protein is :\n$protein\n"; <STDIN>;
Re: translating multiple DNA sequence to protein sequence
by marto (Bishop) on Aug 19, 2013 at 07:51 UTC

    What is this, a question or a code submission? If you're not asking a question then you've posted it in the wrong place. A link to Where should I post X? is displayed each time you post. Seekers of Perl Wisdom is for questions, Cool Uses for Perl is for code you want to share.

    You may want to read open, use the 3 argument open and actually die if you can't open your input file, printing $! to tell users why it fails.

Re: translating multiple DNA sequence to protein sequence
by kcott (Chancellor) on Aug 19, 2013 at 07:53 UTC

    G'day yuvraj_ghaly,

    "This code will help to translate a DNA sequence to protein sequence. The need of an our is to translate all the DNA sequences present in fasta file into protein sequences respectively. Here is the code: ..."

    There is no question here!

    You have been directed to "How do I post a question effectively?" on more than one occasion in the past. Please actually read it this time and follow its guidelines.

    I have downvoted your post.

    -- Ken

      The OP is looking for help with a FASTA parser; the 'question' was just worded as a statement of what they need.

Re: translating multiple DNA sequence to protein sequence
by Monk::Thomas (Friar) on Aug 19, 2013 at 07:46 UTC

    "Here is the code:"

    What is the question?

      The question is I would like to extract sequences from multi-fasta file. How would I modify this code to do so
Re: translating multiple DNA sequence to protein sequence
by yuvraj_ghaly (Sexton) on Aug 22, 2013 at 05:40 UTC

    Now I created a modified program for translating DNA sequences.

    It seems to give error output:


    I want to translate DNA sequences that are present in fasta file into their respective sequences. I need help from Perl monks.

    Here is the code which I used now

    use strict; #use warnings; use Encode; for my $file (@ARGV) { open my $fh, '<:encoding(UTF-8)', $file; my $input = join q{}, <$fh>; close $fh; while ( $input =~ /(^>.*?\w?)$([^>]*)/smxg ) { my $name = $1; my $seq = $2; $seq =~ s/\n//smxg; my $trans = codon2aa($seq); print "$name\t$trans\n"; } } sub codon2aa { my($codon) = @_; $codon = uc $codon; my(%genetic_code) = ( 'TCA' => 'S', # Serine 'TCC' => 'S', # Serine 'TCG' => 'S', # Serine 'TCT' => 'S', # Serine 'TTC' => 'F', # Phenylalanine 'TTT' => 'F', # Phenylalanine 'TTA' => 'L', # Leucine 'TTG' => 'L', # Leucine 'TAC' => 'Y', # Tyrosine 'TAT' => 'Y', # Tyrosine 'TAA' => '_', # Stop 'TAG' => '_', # Stop 'TGC' => 'C', # Cysteine 'TGT' => 'C', # Cysteine 'TGA' => '_', # Stop 'TGG' => 'W', # Tryptophan 'CTA' => 'L', # Leucine 'CTC' => 'L', # Leucine 'CTG' => 'L', # Leucine 'CTT' => 'L', # Leucine 'CCA' => 'P', # Proline 'CCC' => 'P', # Proline 'CCG' => 'P', # Proline 'CCT' => 'P', # Proline 'CAC' => 'H', # Histidine 'CAT' => 'H', # Histidine 'CAA' => 'Q', # Glutamine 'CAG' => 'Q', # Glutamine 'CGA' => 'R', # Arginine 'CGC' => 'R', # Arginine 'CGG' => 'R', # Arginine 'CGT' => 'R', # Arginine 'ATA' => 'I', # Isoleucine 'ATC' => 'I', # Isoleucine 'ATT' => 'I', # Isoleucine 'ATG' => 'M', # Methionine 'ACA' => 'T', # Threonine 'ACC' => 'T', # Threonine 'ACG' => 'T', # Threonine 'ACT' => 'T', # Threonine 'AAC' => 'N', # Asparagine 'AAT' => 'N', # Asparagine 'AAA' => 'K', # Lysine 'AAG' => 'K', # Lysine 'AGC' => 'S', # Serine 'AGT' => 'S', # Serine 'AGA' => 'R', # Arginine 'AGG' => 'R', # Arginine 'GTA' => 'V', # Valine 'GTC' => 'V', # Valine 'GTG' => 'V', # Valine 'GTT' => 'V', # Valine 'GCA' => 'A', # Alanine 'GCC' => 'A', # Alanine 'GCG' => 'A', # Alanine 'GCT' => 'A', # Alanine 'GAC' => 'D', # Aspartic Acid 'GAT' => 'D', # Aspartic Acid 'GAA' => 'E', # Glutamic Acid 'GAG' => 'E', # Glutamic Acid 'GGA' => 'G', # Glycine 'GGC' => 'G', # Glycine 'GGG' => 'G', # Glycine 'GGT' => 'G', # Glycine ); if(exists $genetic_code{$codon}) { return $genetic_code{$codon}; }else{ print STDERR "Bad codon \"$codon\"!!\n"; exit; } }
      The subroutine is called "codon2aa". You supply the sequence as the parameter, but you should run it on individual codons:
      for my $codon ($seq =~ /(...)/g) { my $trans = codon2aa($codon); print "$name\t$trans\n"; }
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        This won't work when numerous DNA sequences in fasta file

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1049968]
Approved by Corion
[Lady_Aleena]: Would you use a module that is almost 150 characters to type with the use and importing all subroutines?
[LanX]: use module :all ?
[Lady_Aleena]: Lax, that would take it down to a little less than 100 characters. The module name is nearly 80 characters long.
Lady_Aleena has fumble fingers today.
[Lady_Aleena]: Sorry for the name typo LanX.
[LanX]: never lad_ena!
[LanX]: ;)

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (11)
As of 2017-05-24 21:56 GMT
Find Nodes?
    Voting Booth?