Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: (Golf) RNA Genetic Code Translator

by scain (Curate)
on Jul 06, 2001 at 21:04 UTC ( [id://94518]=note: print w/replies, xml ) Need Help??


in reply to (Golf) RNA Genetic Code Translator

update: DNA, RNA what's the difference? My original code used the cDNA, not the mRNA. I changed it and reran it, and everyone's code now works except for japhy's.

OK, this is going to be a long one...

I was going to benchmark these golf examples to see which one was fastest, but there seems to be some cheating going on. Honestly, I don't really understand what any of these is doing, so I don't know if the cheating was intentional or not. To do the benchmarking, was was going to use the CFTR mRNA (that is the protein that, when mutated, causes cystic fibrosis). The mRNA (with leading and trailing sequence removed) is in the __DATA__ section of the code. The correct translation looks like this:

MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVDSADNLSEKLEREWDRELASKKNPKLI +NALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIAIYLGIGLCLLFIVRTLLLH +PAIFGLHHIGMQMRIAMFSLIYKKTLKLSSRVLDKISIGQLVSLLSNNLNKFDEGLALAHFVWIAPLQV +ALLMGLIWELLQASAFCGLGFLIVLALFQAGLGRMMMKYRDQRAGKISERLVITSEMIENIQSVKAYCW +EEAMEKMIENLRQTELKLTRKAAYVRYFNSSAFFFSGFFVVFLSVLPYALIKGIILRKIFTTISFCIVL +RMAVTRQFPWAVQTWYDSLGAINKIQDFLQKQEYKTLEYNLTTTEVVMENVTAFWEEGFGELFEKAKQN +NNNRKTSNGDDSLFFSNFSLLGTPVLKDINFKIERGQLLAVAGSTGAGKTSLLMMIMGELEPSEGKIKH +SGRISFCSQFSWIMPGTIKENIIFGVSYDEYRYRSVIKACQLEEDISKFAEKDNIVLGEGGITLSGGQR +ARISLARAVYKDADLYLLDSPFGYLDVLTEKEIFESCVCKLMANKTRILVTSKMEHLKKADKILILNEG +SSYFYGTFSELQNLQPDFSSKLMGCDSFDQFSAERRNSILTETLHRFSLEGDAPVSWTETKKQSFKQTG +EFGEKRKNSILNPINSIRKFSIVQKTPLQMNGIEEDSDEPLERRLSLVPDSEQGEAILPRISVISTGPT +LQARRRQSVLNLMTHSVNQGQNIHRKTTASTRKVSLAPQANLTELDIYSRRLSQETGLEISEEINEEDL +KECLFDDMESIPAVTTWNTYLRYITVHKSLIFVLIWCLVIFLAEVAASLVVLWLLGNTPLQDKGNSTHS +RNNSYAVIITSTSSYYVFYIYVGVADTLLAMGFFRGLPLVHTLITVSKILHHKMLHSVLQAPMSTLNTL +KAGGILNRFSKDIAILDDLLPLTIFDFIQLLLIVIGAIAVVAVLQPYIFVATVPVIVAFIMLRAYFLQT +SQQLKQLESEGRSPIFTHLVTSLKGLWTLRAFGRQPYFETLFHKALNLHTANWFLYLSTLRWFQMRIEM +IFVIFFIAVTFISILTTGEGEGRVGIILTLAMNIMSTLQWAVNSSIDVDSLMRSVSRVFKFIDMPTEGK +PTKSTKPYKNGQLSKVMIIENSHVKKDDIWPSGGQMTVKDLTAKYTEGGNAILENISFSISPGQRVGLL +GRTGSGKSTLLSAFLRLLNTEGEIQIDGVSWDSITLQQWRKAFGVIPQKVFIFSGTFRKNLDPYEQWSD +QEIWKVADEVGLRSVIEQFPGKLDFVLVDGGCVLSHGHKQLMCLARSVLSKAKILLLDEPSAHLDPVTY +QIIRRTLKQAFADCTVILCEHRIEAMLECQQFLVIEENKVRQYDSIQKLLNERSLFRQAISPSDRVKLF +PHRNSSKCKSKPQIAALKEETEEEVQDTRL.
However, tachyon's, MeowChow's and tadman's orignal codes all gave this:
QRPEKASKSTRPRKGRQREDQPADEKEREREAKKPKARRRGGETKAQPGRADPNKEERAGGRTHPAGHGQ +RAKKTKSRKGQNNNKEGAAAPQAGEQAAGGAQAGGRKRQRAGKERTEEQKAEEAEKENRQTEKTRKAAR +SAGPAKGRKTTRATRQPAQTDGANKQQKQEKTENTTTEETAEEGGEEKAKQNNRKTGDSGTPKKERGQA +AGTGAGKTGEEPEGKKHGRQPGTKEGERRSKAQEEDKAEKDGEGGTGGQRARARAKADPGTEKEESKAN +KTRTKEKKADKEGSSGTEQQPDSKGDQAERRTETHREGAPTETKKQKQTGEGEKRKPNRKQKTPQGEEE +PERRPEQGEAPRSSTGPTQARRRQNTHNQGQNHRKTTATRKAPQANTERRQETGEEENEEDKEESPATT +NTRTHKSAEAAGNTPQDKGTRNSATSTGADTAGRGPTTKHHKQAPTNTKAGGRKADPTDQGAAAQPATP +ARAQTQQKQEEGRPTTSKGTRAGRQPETHKATANTRQREATTTGEGEGRGTATQANSSRSRKDPTEGKP +TKTKPKGQKEHKKDPGGQTKTAKTEGGAENPGQRGGRTGGKTARNTEGEQGTQQRKAGPQKGTRKNPEQ +QEKAEGREQPGKDGGSGHKQARKAKEPAPTQRRTKQAATEHREAEQQEENKRQQKNERSRQASPDRKPH +RNSKKKPQAAKEETEEEQTR
It is not at all clear to me why, and it is not at all related to CFTR. For that matter, it's not related to any protein in public databases. Congradulations, you did gene discovery; pharamceutical companies spent billions of dollars to do that :-) Also, japhy's code returns nothing (except some line feeds apparently).

So, can anyone point out the problems with these subs? I copied them directly from the html, and only removed "+" at the beginning of code wrapped lines, and changed the name of the subs. Here is my code:

#!/usr/bin/perl while (<DATA>) { $cftr=$_; } print "tadman original\n".f0($cftr)."\n\n"; print "japhy\n".f1($cftr)."\n\n"; print "MeowChow\n".f2($cftr)."\n\n"; print "no_slogan\n".f3($cftr)."\n\n"; print "srawls\n".f4($cftr)."\n\n"; print "tachyon\n".RNA($cftr)."\n\n"; print "tadman golf\n".f5($cftr)."\n\n"; sub f0 { # orginal by tadman my %g = ( # . - Stop 'UAA'=>'.','UAG'=>'.','UGA'=>'.', # A - Alanine 'GCU'=>'A','GCC'=>'A','GCA'=>'A','GCG'=>'A', # C - Cysteine 'UGU'=>'C','UGC'=>'C', # D - Aspartic Acid 'GAU'=>'D','GAC'=>'D', # E - Glutamic Acid 'GAA'=>'E','GAG'=>'E', # F - Phenylalanine 'UUU'=>'F','UUC'=>'F', # G - Glycine 'GGU'=>'G','GGC'=>'G','GGA'=>'G','GGG'=>'G', # H - Histidine 'CAU'=>'H','CAC'=>'H', # I - Isoleucine 'AUU'=>'I','AUC'=>'I','AUA'=>'I', # K - Lysine 'AAA'=>'K','AAG'=>'K', # L - Leucine 'CUU'=>'L','CUC'=>'L','CUA'=>'L','CUG'=>'L', 'UUA'=>'L','UUG'=>'L', # M - Methionine 'AUG'=>'M', # N - Asparagine 'AAU'=>'N','AAC'=>'N', # P - Proline 'CCU'=>'P','CCC'=>'P','CCA'=>'P','CCG'=>'P', # Q - Glutamine 'CAA'=>'Q','CAG'=>'Q', # R - Arginine 'CGU'=>'R','CGC'=>'R','CGA'=>'R','CGG'=>'R', 'AGA'=>'R','AGG'=>'R', # S - Serine 'UCU'=>'S','UCC'=>'S','UCA'=>'S','UCG'=>'S', 'AGU'=>'S','AGC'=>'S', # T - Threonine 'ACU'=>'T','ACC'=>'T','ACA'=>'T','ACG'=>'T', # V - Valine 'GUU'=>'V','GUC'=>'V','GUA'=>'V','GUG'=>'V', # W - Tryptophan 'UGG'=>'W', # Y - Tyrosine 'UAU'=>'Y','UAC'=>'Y', ); $_=pop;s/.{1,3}/$g{$&}/g;$_ } sub #japhy B(){''}sub Z(){(B)x13}sub U(){(B)x31}sub O(){(B)x83}sub J(){(B)x343}sub b(){B,B,B}@g{AAA..UUU}=(K,B,N,b,K,Z,N,U,T,B,T,b,T,Z,T,O,R,B,S, +b,R,Z,S,J, I,B,I,b,M,Z,I,(B)x811,Q,B,H,b,Q,Z,H,U,P,B,P,b,P,Z,P,O,R,B,R,b, +R,Z,R,J,L, B,L,b,L,Z,L,(B)x2163,E,B,D,b,E,Z,D,U,A,B,A,b,A,Z,A,O,G,B,G,b,G +,Z,G,J,V,B ,V,b,V,Z,V,(B)x8923,'.',B,Y,b,'.',Z,Y,U,S,B,S,b,S,Z,S,O,'.',B, +C,b,W,Z,C, J,L,B,F,b,L,Z,F);sub f1{$_=pop;s/..?.?/$g{$&}/g;$_} sub f2{ #MeowChow my@r=qw(UA[AG]|UGA GC. - UG[UC] GA[UC] GA[AG] UU[UC] GG. CA[UC] AU[^G] + - AA[AG] CU.|UU[AG] AUG AA[UC] - CC. CA[AG] CG.|AG[AG] UC.|AG[UC] AC +. - GU. UGG - UA[UC] ^); ((my$t=pop)=~s|..?.?|chr 64+(grep$&=~/$r[$_]/,0..26)[0]|eg);$t=~y/@Z/. +/d;$t } sub f3 { #no_slogan $_="KNNKtIIIMRSSRQHHQplr.YY.sLFFL.CCWEDDEavg";s/[a-z]/uc$&x4/eg;@x=/./ +g;join"",@x[map{$x=0;$x=$x*4|6&ord for/./g;$x/2}pop=~/.../g] } sub f4 { #srawls $_="KNNKtIIIMRSSRQHHQplr.YY.sLFFL.CCWEDDEavg";s/[a-z]/uc$&x4/eg; join"",(/./g)[map{$x=0;$x=$x*4|6&ord for/./g;$x/2}pop=~/.../g] } sub RNA { #tachyon @_{'UAAUAGUGAGCUGCCGCAGCGUGUUGCGAUGACGAAGAGUUUUUCGGUGGCGGAGGGCAUCACAUU +AUCAUAAAAAAGCUUCUCCUACUGUUAUUGAUGAAUAACCCUCCCCCACCGCAACAGCGUCGCCGACGG +AGAAGG UCUUCCUCAUCGAGUAGCACUACCACAACGGUUGUCGUAGUGUGGUAUUAC'=~/(...)/g}=split/ +/,'...AAAACCDDEEFFGGGGHHIIIKKLLLLLLMNNPPPPQQRRRRRRSSSSSSTTTTVVVVWYY'; +$_=pop ;s/..?.?/$_{$&}/g;$_ } sub f5{ #tadman $_=pop;y/UCAG/0123/;s/(.)(.)(.)/substr "FFLLSSSSYY..CC.WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG" ,$1<<4|$2<<2|$3,1/ge;y/0123//d;$_ } #>gi|6995995|ref|NM_000492.2| Homo sapiens cystic fibrosis transmembra +ne conductance regulator, ATP-binding cassette (sub-family C, member +7) (CF TR), mRNA __DATA__ AUGCAGAGGUCGCCUCUGGAAAAGGCCAGCGUUGUCUCCAAACUUUUUUUCAGCUGGACCAGACCAAUUU +UGAGGAAAGGAUACAGACAGCGCCUGGAAUUGUCAGACAUAUACCAAAUCCCUUCUGUUGAUUCUGCUG +ACAAUCUAUCUGAAAAAUUGGAAAGAGAAUGGGAUAGAGAGCUGGCUUCAAAGAAAAAUCCUAAACUCA +UUAAUGCCCUUCGGCGAUGUUUUUUCUGGAGAUUUAUGUUCUAUGGAAUCUUUUUAUAUUUAGGGGAAG +UCACCAAAGCAGUACAGCCUCUCUUACUGGGAAGAAUCAUAGCUUCCUAUGACCCGGAUAACAAGGAGG +AACGCUCUAUCGCGAUUUAUCUAGGCAUAGGCUUAUGCCUUCUCUUUAUUGUGAGGACACUGCUCCUAC +ACCCAGCCAUUUUUGGCCUUCAUCACAUUGGAAUGCAGAUGAGAAUAGCUAUGUUUAGUUUGAUUUAUA +AGAAGACUUUAAAGCUGUCAAGCCGUGUUCUAGAUAAAAUAAGUAUUGGACAACUUGUUAGUCUCCUUU +CCAACAACCUGAACAAAUUUGAUGAAGGACUUGCAUUGGCACAUUUCGUGUGGAUCGCUCCUUUGCAAG +UGGCACUCCUCAUGGGGCUAAUCUGGGAGUUGUUACAGGCGUCUGCCUUCUGUGGACUUGGUUUCCUGA +UAGUCCUUGCCCUUUUUCAGGCUGGGCUAGGGAGAAUGAUGAUGAAGUACAGAGAUCAGAGAGCUGGGA +AGAUCAGUGAAAGACUUGUGAUUACCUCAGAAAUGAUUGAAAAUAUCCAAUCUGUUAAGGCAUACUGCU +GGGAAGAAGCAAUGGAAAAAAUGAUUGAAAACUUAAGACAAACAGAACUGAAACUGACUCGGAAGGCAG +CCUAUGUGAGAUACUUCAAUAGCUCAGCCUUCUUCUUCUCAGGGUUCUUUGUGGUGUUUUUAUCUGUGC +UUCCCUAUGCACUAAUCAAAGGAAUCAUCCUCCGGAAAAUAUUCACCACCAUCUCAUUCUGCAUUGUUC +UGCGCAUGGCGGUCACUCGGCAAUUUCCCUGGGCUGUACAAACAUGGUAUGACUCUCUUGGAGCAAUAA +ACAAAAUACAGGAUUUCUUACAAAAGCAAGAAUAUAAGACAUUGGAAUAUAACUUAACGACUACAGAAG +UAGUGAUGGAGAAUGUAACAGCCUUCUGGGAGGAGGGAUUUGGGGAAUUAUUUGAGAAAGCAAAACAAA +ACAAUAACAAUAGAAAAACUUCUAAUGGUGAUGACAGCCUCUUCUUCAGUAAUUUCUCACUUCUUGGUA +CUCCUGUCCUGAAAGAUAUUAAUUUCAAGAUAGAAAGAGGACAGUUGUUGGCGGUUGCUGGAUCCACUG +GAGCAGGCAAGACUUCACUUCUAAUGAUGAUUAUGGGAGAACUGGAGCCUUCAGAGGGUAAAAUUAAGC +ACAGUGGAAGAAUUUCAUUCUGUUCUCAGUUUUCCUGGAUUAUGCCUGGCACCAUUAAAGAAAAUAUCA +UCUUUGGUGUUUCCUAUGAUGAAUAUAGAUACAGAAGCGUCAUCAAAGCAUGCCAACUAGAAGAGGACA +UCUCCAAGUUUGCAGAGAAAGACAAUAUAGUUCUUGGAGAAGGUGGAAUCACACUGAGUGGAGGUCAAC +GAGCAAGAAUUUCUUUAGCAAGAGCAGUAUACAAAGAUGCUGAUUUGUAUUUAUUAGACUCUCCUUUUG +GAUACCUAGAUGUUUUAACAGAAAAAGAAAUAUUUGAAAGCUGUGUCUGUAAACUGAUGGCUAACAAAA +CUAGGAUUUUGGUCACUUCUAAAAUGGAACAUUUAAAGAAAGCUGACAAAAUAUUAAUUUUGAAUGAAG +GUAGCAGCUAUUUUUAUGGGACAUUUUCAGAACUCCAAAAUCUACAGCCAGACUUUAGCUCAAAACUCA +UGGGAUGUGAUUCUUUCGACCAAUUUAGUGCAGAAAGAAGAAAUUCAAUCCUAACUGAGACCUUACACC +GUUUCUCAUUAGAAGGAGAUGCUCCUGUCUCCUGGACAGAAACAAAAAAACAAUCUUUUAAACAGACUG +GAGAGUUUGGGGAAAAAAGGAAGAAUUCUAUUCUCAAUCCAAUCAACUCUAUACGAAAAUUUUCCAUUG +UGCAAAAGACUCCCUUACAAAUGAAUGGCAUCGAAGAGGAUUCUGAUGAGCCUUUAGAGAGAAGGCUGU +CCUUAGUACCAGAUUCUGAGCAGGGAGAGGCGAUACUGCCUCGCAUCAGCGUGAUCAGCACUGGCCCCA +CGCUUCAGGCACGAAGGAGGCAGUCUGUCCUGAACCUGAUGACACACUCAGUUAACCAAGGUCAGAACA +UUCACCGAAAGACAACAGCAUCCACACGAAAAGUGUCACUGGCCCCUCAGGCAAACUUGACUGAACUGG +AUAUAUAUUCAAGAAGGUUAUCUCAAGAAACUGGCUUGGAAAUAAGUGAAGAAAUUAACGAAGAAGACU +UAAAGGAGUGCCUUUUUGAUGAUAUGGAGAGCAUACCAGCAGUGACUACAUGGAACACAUACCUUCGAU +AUAUUACUGUCCACAAGAGCUUAAUUUUUGUGCUAAUUUGGUGCUUAGUAAUUUUUCUGGCAGAGGUGG +CUGCUUCUUUGGUUGUGCUGUGGCUCCUUGGAAACACUCCUCUUCAAGACAAAGGGAAUAGUACUCAUA +GUAGAAAUAACAGCUAUGCAGUGAUUAUCACCAGCACCAGUUCGUAUUAUGUGUUUUACAUUUACGUGG +GAGUAGCCGACACUUUGCUUGCUAUGGGAUUCUUCAGAGGUCUACCACUGGUGCAUACUCUAAUCACAG +UGUCGAAAAUUUUACACCACAAAAUGUUACAUUCUGUUCUUCAAGCACCUAUGUCAACCCUCAACACGU +UGAAAGCAGGUGGGAUUCUUAAUAGAUUCUCCAAAGAUAUAGCAAUUUUGGAUGACCUUCUGCCUCUUA +CCAUAUUUGACUUCAUCCAGUUGUUAUUAAUUGUGAUUGGAGCUAUAGCAGUUGUCGCAGUUUUACAAC +CCUACAUCUUUGUUGCAACAGUGCCAGUGAUAGUGGCUUUUAUUAUGUUGAGAGCAUAUUUCCUCCAAA +CCUCACAGCAACUCAAACAACUGGAAUCUGAAGGCAGGAGUCCAAUUUUCACUCAUCUUGUUACAAGCU +UAAAAGGACUAUGGACACUUCGUGCCUUCGGACGGCAGCCUUACUUUGAAACUCUGUUCCACAAAGCUC +UGAAUUUACAUACUGCCAACUGGUUCUUGUACCUGUCAACACUGCGCUGGUUCCAAAUGAGAAUAGAAA +UGAUUUUUGUCAUCUUCUUCAUUGCUGUUACCUUCAUUUCCAUUUUAACAACAGGAGAAGGAGAAGGAA +GAGUUGGUAUUAUCCUGACUUUAGCCAUGAAUAUCAUGAGUACAUUGCAGUGGGCUGUAAACUCCAGCA +UAGAUGUGGAUAGCUUGAUGCGAUCUGUGAGCCGAGUCUUUAAGUUCAUUGACAUGCCAACAGAAGGUA +AACCUACCAAGUCAACCAAACCAUACAAGAAUGGCCAACUCUCGAAAGUUAUGAUUAUUGAGAAUUCAC +ACGUGAAGAAAGAUGACAUCUGGCCCUCAGGGGGCCAAAUGACUGUCAAAGAUCUCACAGCAAAAUACA +CAGAAGGUGGAAAUGCCAUAUUAGAGAACAUUUCCUUCUCAAUAAGUCCUGGCCAGAGGGUGGGCCUCU +UGGGAAGAACUGGAUCAGGGAAGAGUACUUUGUUAUCAGCUUUUUUGAGACUACUGAACACUGAAGGAG +AAAUCCAGAUCGAUGGUGUGUCUUGGGAUUCAAUAACUUUGCAACAGUGGAGGAAAGCCUUUGGAGUGA +UACCACAGAAAGUAUUUAUUUUUUCUGGAACAUUUAGAAAAAACUUGGAUCCCUAUGAACAGUGGAGUG +AUCAAGAAAUAUGGAAAGUUGCAGAUGAGGUUGGGCUCAGAUCUGUGAUAGAACAGUUUCCUGGGAAGC +UUGACUUUGUCCUUGUGGAUGGGGGCUGUGUCCUAAGCCAUGGCCACAAGCAGUUGAUGUGCUUGGCUA +GAUCUGUUCUCAGUAAGGCGAAGAUCUUGCUGCUUGAUGAACCCAGUGCUCAUUUGGAUCCAGUAACAU +ACCAAAUAAUUAGAAGAACUCUAAAACAAGCAUUUGCUGAUUGCACAGUAAUUCUCUGUGAACACAGGA +UAGAAGCAAUGCUGGAAUGCCAACAAUUUUUGGUCAUAGAAGAGAACAAAGUGCGGCAGUACGAUUCCA +UCCAGAAACUGCUGAACGAGAGGAGCCUCUUCCGGCAAGCCAUCAGCCCCUCCGACAGGGUGAAGCUCU +UUCCCCACCGGAACUCAAGCAAGUGCAAGUCUAAGCCCCAGAUUGCUGCUCUGAAAGAGGAGACAGAAG +AAGAGGUGCAAGAUACAAGGCUUUAG
Happy debugging golfed code,
Scott

Replies are listed 'Best First'.
Re^2: (Golf) RNA Genetic Code Translator
by tadman (Prior) on Jul 06, 2001 at 22:06 UTC
    I'm not sure how you got those results, and the code you posted had some trouble running too. Apparently the __DATA__ wasn't being imported correctly.

    I changed that to a definition:
    $cftr="AUGCAGAGGUCGCCUCUGGAAA...";
    Everything ran fine after that, except that japhy just spins for a while and then outputs nothing. Otherwise, the results appear to be as expected.

    Update: With respect to scain's update, this update basically says that I didn't actually read his update, and so, this entire node is kind of pointless.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://94518]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (3)
As of 2024-04-18 18:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found