Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Storage of proteins from diferent ORF

by kennethk (Monsignor)
on Mar 31, 2014 at 16:53 UTC ( #1080427=note: print w/ replies, xml ) Need Help??


in reply to Storage of proteins from diferent ORF

Welcome to the monastery.

What do you expect the line @marcos=<,$protein>; to do? Perl will read that as a glob, but is fixed content, so it equivalent to @marcos=",$protein";, which will always yield a single value. If I run your code with $DNA equal to AGAAGAAGA, I get the output

La secuencia 0 es: ,RRR La secuencia 1 es: ,RRREExx La secuencia 2 es: ,RRREExxKKx
because your $protein variable persists across loops. It's important when you are describing issues in include sample inputs and desired outputs (as described in How do I post a question effectively?) so that language issues don't get in the way of understanding. If the output you were going for is more like:
La secuencia 0 es: RRR La secuencia 1 es: EExx La secuencia 2 es: KKx
Then your code should probably look more like:
use strict; use warnings; chomp(my $DNA = <>); my %acid_map = ( TTT => 'F', TTC => 'F', GGT => 'G', GGC => 'G', GGA => 'G', GGG => 'G', GCT => 'A', GCC => 'A', GCA => 'A', GCG => 'A', TTA => 'L', TTG => 'L', CTT => 'L', CTC => 'L', CTG => 'L', CTA => 'L', GTT => 'V', GTC => 'V', GTA => 'V', GTG => 'V', ATT => 'I', ATC => 'I', ATA => 'I', CCT => 'P', CCC => 'P', CCA => 'P', CCG => 'P', TCT => 'S', TCC => 'S', TCA => 'S', TCG => 'S', ACT => 'T', ACC => 'T', ACA => 'T', ACG => 'T', TGT => 'C', TGC => 'C', TAT => 'Y', TAC => 'Y', AAT => 'N', AAC => 'N', CAA => 'Q', CAG => 'Q', GAT => 'D', GAC => 'D', GAA => 'E', GAG => 'E', CGT => 'R', CGC => 'R', CGA => 'R', CGG => 'R', AGA => 'R', AGG => 'R', AAA => 'K', AAG => 'K', CAT => 'H', CAC => 'H', TGG => 'W', TGA => 'W', ATG => 'M', TAA => '8', TAG => '8', ); foreach my $c (0 .. 2){ my $protein = ''; my $position=$c; while ($position < length $DNA) { my $codon=substr($DNA, $position, 3); if ($acid_map{$codon}) { $position += 3; $protein .= $acid_map{$codon}; } else { $position++; $protein .= 'x'; } } print "La secuencia $c es: $protein\n\n"; }
where I've made the following changes:
  1. I added strict and warnings; see Use strict warnings and diagnostics or die for some reasons why

  2. Instead of a long list of if-elsif-elses, I've used a hash. It makes the algorithm more immediately legible and will run faster (O(N) vs. O(N^2) for the original).

  3. I used a foreach loop instead of a C-style for loop for $c since you have a fixed series of numbers, and thus no need for complex logic.

  4. I swapped the inner loop to a while loop, since you have variable strides. I also used the opportunity to centralize the strides, so that $position is only changed once per iteration, hopefully improving clarity of intent.

  5. I swapped to compound assignment operators (Assignment Operators) since you always had duplicate variables. Less typing means fewer opportunities for typos, and again makes reading the code more obvious.

  6. Since you were already using interpolating quotes (Quote and Quote like Operators), I removed the unnecessary splits into a list for your output. I also removed the no-op associated with the glob.

  7. Lastly, I removed the loop from around your print. You could also express this as a push to an array scoped outside the foreach loop, and then printing outside the loop.

Please review this code, and ask me questions about how it works if anything is unclear.

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.


Comment on Re: Storage of proteins from diferent ORF
Select or Download Code
Re^2: Storage of proteins from diferent ORF
by frozenwithjoy (Curate) on Apr 25, 2014 at 04:55 UTC
    I had to do some translation from DNA/RNA to amino acids and grabbed this codon table. I found, however, that there are a couple missing amino acids and one or two mistakes. Just want to drop a fixed version here in case people come looking in the future:
    my %codon_table = ( AAA => 'K', AAC => 'N', AAG => 'K', AAT => 'N', ACA => 'T', ACC => 'T', ACG => 'T', ACT => 'T', AGA => 'R', AGC => 'S', AGG => 'R', AGT => 'S', ATA => 'I', ATC => 'I', ATG => 'M', ATT => 'I', CAA => 'Q', CAC => 'H', CAG => 'Q', CAT => 'H', CCA => 'P', CCC => 'P', CCG => 'P', CCT => 'P', CGA => 'R', CGC => 'R', CGG => 'R', CGT => 'R', CTA => 'L', CTC => 'L', CTG => 'L', CTT => 'L', GAA => 'E', GAC => 'D', GAG => 'E', GAT => 'D', GCA => 'A', GCC => 'A', GCG => 'A', GCT => 'A', GGA => 'G', GGC => 'G', GGG => 'G', GGT => 'G', GTA => 'V', GTC => 'V', GTG => 'V', GTT => 'V', TAA => '-', TAC => 'Y', TAG => '-', TAT => 'Y', TCA => 'S', TCC => 'S', TCG => 'S', TCT => 'S', TGA => '-', TGC => 'C', TGG => 'W', TGT => 'C', TTA => 'L', TTC => 'F', TTG => 'L', TTT => 'F', );

    EDIT: Here is one I made that I like more. It is structured like you normally see codon tables in books.

    my %codon_table = ( TTT => 'F', TCT => 'S', TAT => 'Y', TGT => 'C', TTC => 'F', TCC => 'S', TAC => 'Y', TGC => 'C', TTA => 'L', TCA => 'S', TAA => '-', TGA => '-', TTG => 'L', TCG => 'S', TAG => '-', TGG => 'W', CTT => 'L', CCT => 'P', CAT => 'H', CGT => 'R', CTC => 'L', CCC => 'P', CAC => 'H', CGC => 'R', CTA => 'L', CCA => 'P', CAA => 'Q', CGA => 'R', CTG => 'L', CCG => 'P', CAG => 'Q', CGG => 'R', ATT => 'I', ACT => 'T', AAT => 'N', AGT => 'S', ATC => 'I', ACC => 'T', AAC => 'N', AGC => 'S', ATA => 'I', ACA => 'T', AAA => 'K', AGA => 'R', ATG => 'M', ACG => 'T', AAG => 'K', AGG => 'R', GTT => 'V', GCT => 'A', GAT => 'D', GGT => 'G', GTC => 'V', GCC => 'A', GAC => 'D', GGC => 'G', GTA => 'V', GCA => 'A', GAA => 'E', GGA => 'G', GTG => 'V', GCG => 'A', GAG => 'E', GGG => 'G', );

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1080427]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2014-10-02 03:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (46 votes), past polls