Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

This is OT, but here's an approch to doing the actual translation by lookup rather than by a large, nested  if...elsif in a subroutine. Should be faster. Some attempts at data validation are made, but no guarantee this is bulletproof. Uses state feature and the  // operator, both from 5.10+; if you don't have 5.10, let me know and I'll make work-arounds.

use warnings; use strict; use feature qw(state); use Test::More # tests => ?? + 1 # Test::NoWarnings add 1 test 'no_plan' ; use Test::NoWarnings; use Data::Dump; my @codons = qw(/GC./ /TG[TC]/ /GA[TC]/ /GA[AG]/); my @aminos = qw( A C D E); push @codons, qw(/TT[TC]/ /GG./ /CA[TC]/ /AT[TCA]/); push @aminos, qw( F G H I); push @codons, qw(/AA[AG]/ /TT[AG]|CT./ /ATG/ /AA[TC]/); push @aminos, qw( K L M N); push @codons, qw(/CC./ /CA[AG]/ /CG.|AG[AG]/ /TC.|AG[TC]/); push @aminos, qw( P Q R S); push @codons, qw(/AC./ /GT./ /TGG/ /TA[TC]/ /TA[AG]|TGA/); push @aminos, qw( T V W Y _); die qq{lists of codons and aminos not the same length} if @codons != @aminos; my %xlate; # translate base triplets to amino for (0 .. $#codons) { add_to_table(\%xlate, codons_to_amino($codons[$_], $aminos[$_])), } # dd \%xlate; # FOR DEBUG - see what we got for my $ar_vector ( # base sequence expected protein [ qw(GTTACTGAGTGTGGTTGGCAGACTGCTGGTTGCCGTATGTAT VTECGWQTAGCRMY) ], [ qw(CTTTCTCATTGTGATGTTAAGGATTGGATGTGCTGGCTTCTG LSHCDVKDWMCWLL) ], [ qw(GCTTGGACTTGTGTTGAGATTGATGGTCATTTCTCTATGAAT AWTCVEIDGHFSMN) ], ) { my ($bases, $protein) = @$ar_vector; is encode(\%xlate, $bases), $protein, qq{'$bases' -> '$protein'}; } # subroutines ###################################################### sub encode { my ($hr_xlate, $bases, ) = @_; $bases = canonicalize($bases); die qq{number of bases not multiple of 3} if length($bases) % 3; # translate NON-overlapping base triplets to aminos. $bases =~ s{ (...) }{ $hr_xlate->{$1} // '*' }xmsge; return $bases; # bases now translated to protein sequence } # subroutines for building translation table. sub canonicalize { return uc $_[0]; } sub codons_to_amino { my ($codon_pat, # pattern of codon(s) $amino, # amino codon(s) map to ) = @_; $codon_pat = canonicalize($codon_pat); $amino = canonicalize($amino); # translate regex to glob form. state $bases = canonicalize('ATCG'); state $base = qr{ [$bases] }xms; state $any = qr{ \. }xms; state $some = qr{ \[ $base{2,3} \] }xms; state $codon = qr{ $base $base (?: $base | $any | $some) }xms; state $triplet = qr{ \A $base{3} \z }xms; state $any_glob = '{' . join(q{,}, split '', $bases) . '}'; my @codon_groups = $codon_pat =~ m{ $codon }xmsg; for (@codon_groups) { s{ $any }{$any_glob}xmsg; s{ ($some) }{ some_glob($1) }xmsge; } my @triplets = glob(qq{@codon_groups}); for (@triplets) { die qq{'$_' not a triplet} if $_ !~ $triplet; } return $amino, @triplets; } sub some_glob { my ($some_pat, # 2-3 bases in regex class: [ATC] [AT] ) = @_; my $bases = join q{,}, $some_pat =~ m{ \w }xmsg; return qq{{$bases}}; } sub add_to_table { my ($hr_table, $amino, @triplets, ) = @_; for my $triplet (@triplets) { die qq{'$triplet' already in table} if exists $hr_table->{$triplet}; $hr_table->{$triplet} = $amino; } }

In reply to Re: [OT]: How do I stop appending data: Dispatch translation by AnomalousMonk
in thread How do I stop appending data by 4pt8secs

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others wandering the Monastery: (13)
    As of 2015-07-31 12:29 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









      Results (277 votes), past polls