Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Translation Substring Error (updated)

by haukex (Archbishop)
on Nov 09, 2017 at 15:47 UTC ( [id://1203051]=note: print w/replies, xml ) Need Help??


in reply to Translation Substring Error

@seqarray and $seqarray are two different variables, and you never assign anything to $seqarray, so using substr on it does not make much sense, I suspect you just want to look directly at $seq instead of splitting it (BTW, to get multiple elements out of an array, use Slices or splice). Also, note that you overwrite $amino_acid on every loop iteration. The following minimal changes make your code work for me:

my $seq = shift; my $amino_acid; for (my $i=0; $i<=length($seq)-3; $i=$i+3) { my $codon = substr($seq,$i,3); $amino_acid .= $genetic_code{$codon}; } return $amino_acid;

<update2> Fixed an off-by-one error in the above code; I initially incorrectly translated your $#seqarray-2 into length($seq)-2 ($#seqarray returns the last index of the array, not its length like scalar(@seqarray) does, or length does for strings). That's a good argument against the classic for(;;) and for the two solutions below instead :-) </update2>

If you output the return value from OneFrameTranslation (your current code is ignoring the return value), this gives you:

print OneFrameTranslation('ATGCCCGTAC'),"\n"; print OneFrameTranslation('GCTTCCCAGCGC'),"\n"; __END__ MPV ASQR

By the way, you can probably move your %genetic_code to the top of your code (outside of the sub), so that it only gets initialized once instead of on every call to the sub, and making its name uppercase is the usual convention to indicate it is a constant that should not be changed.

Another way to break up a string is using regular expressions, the following also works - it matches three characters, and then matches again at the position that the previous match finished, and so on:

my $amino_acid; while ($seq=~/\G(...)/sg) { $amino_acid .= $genetic_code{$1}; } return $amino_acid;

Or, possibly going a little overboard, here's a technique I describe in Building Regex Alternations Dynamically to make the replacements using a single regex. I have left out the quotemeta and sort steps only because I know for certain that all keys are three-character strings without any special characters, if you have any doubts about the input data, put those steps back in!

# build the regex, this only needs to be done once my ($genetic_regex) = map qr/$_/, join '|', keys %genetic_code; # apply the regex (my $amino_acid = $seq) =~ s/($genetic_regex)/$genetic_code{$1}/g; return $amino_acid;

However, note this produces slightly different output for the first input: "MPVC" (the leftover C remains unchanged). Whether or not you want this behavior or not is up to you; it can also be accomplished in the first two solutions (although slightly less elegantly than with a regex). Update: Also, in the first two solutions you haven't defined what would happen if a code happens to not be available in the table; the third regex solution would simply leave it unchanged. Also minor edits for clarification.

Replies are listed 'Best First'.
Re^2: Translation Substring Error (updated)
by FIJI42 (Acolyte) on Nov 09, 2017 at 16:12 UTC

    Good point. If a nucleotide triplet with an unknown nucleotide appears (ex. ANC instead of ATC), I'd want to either skip those, or mark them with a letter like 'X'.

    I do like the regex solution though, it's quite elegant.

      If a nucleotide triplet with an unknown nucleotide appears (ex. ANC instead of ATC), I'd want to either skip those, or mark them with a letter like 'X'.

      In the first two solutions, you can use exists, e.g.:

      if ( exists $genetic_code{$codon} ) { $amino_acid .= $genetic_code{$codon}; } else { $amino_acid .= $codon; # - OR - $amino_acid .= 'X'; # or something else... }

      Update: Or, written more tersely, either $amino_acid .= exists $genetic_code{$codon} ? $genetic_code{$codon} : 'X'; or $amino_acid .= $genetic_code{$codon} // 'X'; (the former uses the Conditional Operator, and the latter uses Logical Defined Or instead of exists, assuming you don't have any undef values in your hash).

      I do like the regex solution though, it's quite elegant.

      You can combine my second and third suggestions (for nonexistent codes, this uses the defined-or solution I showed here, the exists solution would work as well):

      (my $amino_acid = $seq) =~ s{(...)} { $genetic_code{$1} // 'X' }esg; return $amino_acid;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1203051]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (3)
As of 2024-04-18 18:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found