Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

RFC: Is the Bible encoded in DNA?

by wstryder (Novice)
on May 14, 2018 at 11:31 UTC ( #1214442=perlmeditation: print w/replies, xml ) Need Help??

I have for a time entertained the idea, that if God is the creator, he would have left his signature in the DNA of human species. If I was the creator, I would have encoded the entire Hebrew Bible in DNA, so to let no one doubt that DNA was created by God and that the Bible is the word of God.

I finally took up the challenge and wrote a perl script to check if the first five verses of Bible are encoded in DNA. Naturally there is an infinite number of ways to encode information in DNA, but I assumed that God would have used something quite obvious in order for us to be able to find information encoded in DNA. I知 assuming that if the Bible is encoded in DNA, the encoding used would be the same as for protein synthesis, namely that triplets of DNA base pairs would encode for one character. There are 64 possible codons so there is plenty of redundancy when they are used for encoding 22 hebrew alphabets (plus sofit forms for five characters).

Like so:

AAA -> Y AAC -> XXX AAG -> B AAT -> XXX ACA -> A ACC -> M ACG -> XXX ACT -> XXX AGA -> R AGC -> R AGG -> W AGT -> W ATA -> H ATC -> XXX ATG -> A ATT -> H CAA -> XXX CAC -> XXX CAG -> I CAT -> XXX CCA -> XXX CCC -> XXX CCG -> XXX CCT -> V CGA -> XXX CGC -> XXX CGG -> O CGT -> XXX CTA -> XXX CTC -> E CTG -> V CTT -> H GAA -> H GAC -> I GAG -> XXX GAT -> T GCA -> XXX GCC -> B GCG -> XXX GCT -> H GGA -> V GGC -> H GGG -> Y GGT -> A GTA -> Y GTC -> V GTG -> A GTT -> I TAA -> E TAC -> XXX TAG -> XXX TAT -> O TCA -> A TCC -> Y TCG -> XXX TCT -> XXX TGA -> R TGC -> H TGG -> A TGT -> XXX TTA -> XXX TTC -> L TTG -> B TTT -> A

My dirty little perl script reads a FASTA file one character at a time and when a triplet is read, it check to see if that codon is already defined. If it is not, the first character of the target sequence is added to a hash containing all the codons. The algorithm then moves to the next triplet in DNA and check to see if that triplet is defined and so on. When a triplet is already defined and the character stored does not equal the target sequence, the script records the maximum length of the sequence found and goes back to the beginning of DNA and moves forward one base pair to continue the search.

I知 not a computer science expert and I知 sure that my script is dirty and messy, but it does work. It takes 33h to search one target sequence against the 3 billion base pairs of human DNA. The FASTA files are in chunks of roughly 150 million base pairs, so several files need to be checked by hand, but this is not much of a problem. My computer crashes when I try to load more than 10million base pairs at a time, so the script reads each FASTA file in chunks of 5 million base pairs at a time.

I could not get hebrew characters to work properly, so I simply translitterated the first five chapters of Genesis to ASCII characters. This is a dirty way of going about it, but it works.


For control sequences I used Lorem ipsum, War and Peace and a random string. For the control sequences I checked the first one million base pairs only.

The results so far:

Lorem ipsum 42 characters found (250 million searched) War and peace 35 characters found (one million searched) Random string 35 characters found (one million searched)

Having checked the hebrew Bible against so far 500 million base pairs, the maximum sequence found was 45 characters. This is more than the control sequences, but only because much more base pairs were compared. To be sure that the sequence was encoded in DNA by God, I would expect to find a sequence of hundreds of characters, preferably all the first five verses of Genesis. I知 not a mathematician, so I have not calculated what the maximum sequence length would be if left to chance alone. But the control sequences do give some estimate.

I知 of course assuming that God used the hebrew Bible, because some say hebrew is the holy language, but I致e also checked the King James English for the first verses of Matthew and John. If God is omnipotent, surely he could have encoded the Bible in DNA in any language. In the future I値l check if New Testament passeges are encoded in Greek, but thus far I知 working with the assumption that the most awesome thing for God to do would have been to encode the biginning of Genesis. Will post results when I find anything.

Let me know what you think of my efforts, I know this is nuts.

My code can be downloaded at

Replies are listed 'Best First'.
Re: RFC: Is the Bible encoded in DNA?
by zentara (Archbishop) on May 14, 2018 at 12:00 UTC
      Speaking of interplanetary travel it may be worth mentioning that the quran predicted the moon landing.

      So you may find god's signature in the DNA or the bible, advanced math in the vedas, but tomorrows lottery numbers may be hidden in the quran.

      In terms of practical value it should be a no-brainer which holy book to study.

        it should be a no-brainer which holy book to study.

        If your book suggests that seeking money thru the lottery is good, I have to inform you that your beliefs are flawed. Money is the blood of Babylon. One should not ask God for any material gains.... it's bad form.

        I'm not really a human, but I play one on earth. ..... an animated JAPH

        That way you can twist anything into prophecy of anything. I bet you would have no problem twisting a quote from a similar book, the Mein Kampf, to prophesy moon landing or the fall of the Twins or anything else.

        In terms of practical value neither Quran nor Mein Kampf ought to be found anywhere but in museums and selected libraries. For exactly the same reasons.

        Enoch was right!
        Enjoy the last years of Rome.

        Hi LanX. I have that movie on cd .... I also bought a new drill!! :-)

        I'm not really a human, but I play one on earth. ..... an animated JAPH
Re: RFC: Is the Bible encoded in DNA?
by morgon (Curate) on May 14, 2018 at 12:33 UTC
    Your enterprise makes total sense to me, apart from the fact that you assume that he used the beginning of the bible as his signature.

    This I believe is wrong.

    The signature to look out for is "yaph".

    Also it would not be unreasonable to assume that beside his signature god has left some code-examples in the divine programming language he used to create the heaven and the earth.

    To locate these search for occurances of "use strict".

      ... search for occurances of "use strict".

      Don't forget the semicolon at the end:  use strict;   The divine code won't compile otherwise.

      Give a man a fish:  <%-{-{-{-<

        The divine code does compile, it is compiling billions of times every second in every one of your cells. Just a thought.
      beside his signature god has left some code-examples in the divine programming language he used to create the heaven and the earth
      Obligatory XKCD reference.
        DNA is the divine programming language, at least of biological systems.
      Yaph is too short, it is not complex enough and it is not specified. The first five verses of Genesis on the other hand are long enough, that is complex enough and they are specified, this is what we are looking for in a signature, complex specified information. Yaph does not cut it.
Re: RFC: Is the Bible encoded in DNA?
by uhClem (Scribe) on May 14, 2018 at 12:15 UTC
    That's absurd. Why would you put graffiti all over your (putatively) best work? Now if you can find the human genome encoded in the Bible, *then* you'll be onto something.
      The encoding works both ways. If the first five verses of Genesis are encoded in DNA that is the same thing as if DNA is encoded in the Bible.
        *Seemingly.* Depends on Who's got the private key.
Re: RFC: Is the Bible encoded in DNA?
by bliako (Friar) on May 14, 2018 at 15:27 UTC

    I will not deride your attempt here, although personally and privately I do. But I will make a suggestion which may at least lighten the burden on your electricity bill and the Environment.

    Do not use sort keys %alephbets in a loop when you can create a static array of sorted keys at startup.

    I have a hunch that the Bible is encoded in the sequence of PI ...

      You have a graet hunch. Why not give it a try and write a perl script to test that hypothesis and post results here?
      It's not a big deal since that is only run when a new maximum sequence is found, which is not very often.
Re: RFC: Is the Bible encoded in DNA?
by LanX (Bishop) on May 14, 2018 at 15:57 UTC
    The question is not if, but in which language... ;p

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Wikisyntax for the Monastery

    PS: with this approach you'll also find the communist manifesto + Commodore Basic

      If this comment is more prescient than silly and becomes the *basis*, nay, the genesis! for such a language in coming years, I shall be rather cross with thee.

      Heheheheheh. …Cross.

      (Update: grammar error corrected thanks to jdporter.)

      Well, you can use my script to see if you can find the communist manifesto in DNA. Good luck with that. Since you say with such confidence that with this approach you can find it, why not give it a try? On my computer it takes 42h to search through the entire human genome, so it doesn't even take that long. Give it a try.

        The gag was languages can be semi-arbitrary or approach total ambiguity and context sensitivity or even be written specifically to suit a use case. Since the Bible matches that pretty well—semi-arbitrary, ambiguous, context sensitive to one of the more backwards versions of the Iron Age, rewritten and pieced together from many sources, the prologue written in a resurrected language that slept a bit longer than 3 days, to suit a particular use—it is most certainly possible to adapt or create a solution to fit the premise given the 3 billion source points and their own manifold interactions. Plenty of gold to pan in them thar hills.

        Since humans as a package are as much non-human as human, with upwards of 100 trillion riders with their own DNA, to say nothing of mitochondria, the data set is so incomplete as to be statistically insignificant. Though, I'll help poke a hole in my rebuttal right right now. We can presume Adam and Eve only carried human DNA and the Fruit was what gave us the rest. Of course the great apes carrying almost only human DNA might be a sticking point; or just a dialect or speech impediment. Their genome seems much more likely to match the New American Standard after all. :P

Re: RFC: Is the Bible encoded in DNA?
by karlgoethebier (Monsignor) on May 14, 2018 at 19:11 UTC
    "...encoded the entire Hebrew Bible in DNA..."

    And all opcodes are hidden in the progression of Giant Steps. My analysis will be published postum.

    ォThe Crux of the Biscuit is the Apostropheサ

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: RFC: Is the Bible encoded in DNA?
by morgon (Curate) on May 14, 2018 at 17:29 UTC
    I think if wstryder really succeeds in finding god's signature we will all be happy he did it with Perl and not some devilish snake-language or whatever other abomination exist in the programming world.

    Let's face it: Perl needs all the publicity it can get.

    I wish you luck.

      What I really need is a mathematician help me out a bit in advance, before I get any results. My computer is now crunching at 500 million base pairs. If generated at random, there is a chance of one in 26^100 of getting a certain English sentence 100 characters in length. That number is astronomically small, there are only 10^80 or so atoms in the entire universe. But with the way I'm searching the DNA, what is the probability of finding a sequence 100 characters in length? The math is beyond my abilities. So far I've got 42 characters for Lorem Ipsum and 45 for the hebrew Bible. So the results so far do not in any way suggest a divine author for DNA. But what result would? What would be needed to convince people? 100 characters, 5000 characters or more?

      To everyone saying that with this method you can find the entire works of Shakespeare or the communist manifesto embedded in DNA I say - give it a try. Use my script or better yet, write your own. It's not possible, not with the encoding I'm using.

        Nobody has spotted the obvious, that the code for reading the file is painfully slow.

        A significant improvement would be:

        while (read($fh, my $char, 1) && $eof) { if ($char =~ m/[ACGT]/ ) { $i++; if ($i >= $start_genome && $i < $end_genome) { push (@genome, $char); } elsif ($i > $end_genome) { $eof = 0; # do the searching here, instead of outside the loop! } } }
        So far I've got 42 characters for Lorem Ipsum and 45 for the hebrew Bible.
        Sorry, can't help you more right now. See Multiple comparisons problem. Any good statistics handbook should explain it in depth.
Re: RFC: Is the Bible encoded in DNA?
by bliako (Friar) on May 17, 2018 at 16:29 UTC

    After reading some of your answers to other comments, it seems to me that you are constantly changing the rules and giving what you try to do better odds to succeed. Alas there is no chance to succeed: even if there was god he/she/it would be not be so lame as to "scratch their name" (or "do Graffiti on his best work" as some other fellow here put it) as Kilroy in the public toilets.

    At first you say 5 first verses from the Genesis is all you are looking to match but later you ask yourself how many characters is statistically significant.

    Later you said that you will also read the DNA backwards. And why not side-ways I ask? Or in leave-one-out fashion -- given how sloppy god is.

    You then say that you want to convert the bases to bits using a particular (intuitive) method. But billion other methods exist. So soon you will try another one and another one. One of it will give you better results, 46 characters instead of the average 45. Is that god's encoding of choice?

    You may soon discover that there are endless possibilities (I believe 64P22 = 90310590525273233291833690659225600000) to encode 64 codons to 22 alphabet letters. That's a few orders of magnitude more than the million monkeys suggested by another person in here.

    For some of these encodings you are bound to go a bit further to finding god in a place who is *definetely* not.

    The first doctrine of the Scientific Method - which is the last thing in the mind of bigots (I mean those "lobotomised by religion", greek "θρησκόληπτος") - is to first (before results come out) set the rules of the experiment and define exactly what each possible outcome means for the conclusions.

      The first doctrine of the Scientific Method
      If you really believe that there is anything like "the" scientific method you should read some Kuhn (or Feyerabend if you want to be extreme).

      Let him have his fun, if he really succeeds in finding an encoding that makes the thora appear in the human dna I'd consider that to be a sensation, if he (most likely) will fail, then he's learned some perl along the way.

      No harm in that.

         No harm in that.

        Except for when, one day, you read the press headlines: "God's word found in Human DNA". They will never bother to tell you that given all the free variables (encoding, number of verses, etc) and the continuous moving-the-goalposts, the odds suddenly became favourable for god (Inc.).

        I can happily co-exist with your citations. After all science is an anarchistic enterprise. !But! there must be some rules in conducting experiments and assessing their outcomes when we share knowledge, else we are all blind.


        no harm in that

Re: RFC: Is the Bible encoded in DNA?
by Anonymous Monk on May 16, 2018 at 21:09 UTC
    Perhaps you should analyze the DNA of "a million monkeys." Perhaps they will eventually produce the texts that you are looking for . . .
Re: RFC: Is the Bible encoded in DNA?
by wstryder (Novice) on May 15, 2018 at 10:15 UTC
    Who wants to take up the challenge and write a script to check if the Bible is encoded in the value of Pi? That would be interesting. I might give it a go next if I have time.

      Since PI is infinite (as far as we know), it follows that all knowledge is encoded in PI in all possible ways. It's just a matter of calculating PI to enough digits and creating a scheme to map it to a language ;)

        0.33333333... is also infinite. I'm pretty sure you wouldn't find my phone number in it though. Because pi is irrational it also has the property of being non repeating, but that's still not enough to contain whatever you are looking for. You could make an infinite, never repeating sequence without ever using the number 9 (by replacing all the 9s in the expansion of pi for example), in which case you wouldn't be able to find the year of birth of most monks.

        It's actually unknown if Pi has the property of containing all possible strings, and while the currently known expansion does make it look so (pi looks normal), no matter how many digits are known, it won't ever be even a fraction of the infinite sequence.

        Well, you can prove your point and write a perl script to see if say, the first five verses of the Bible are indeed encoded in the value of Pi. Why not give it a go? How many decimals would you have to search in order to find say 100 characters? Have you done the math?
Re: RFC: Is the Bible encoded in DNA?
by Anonymous Monk on May 16, 2018 at 01:27 UTC
    What if you found the first five verses of the Quran, instead?
      Well, in that case I for one would have to give up on Jehova and submit to Allah instead.
Re: RFC: Is the Bible encoded in DNA?
by wstryder (Novice) on May 16, 2018 at 18:47 UTC

    I have received some great tips from here and on other forums. Some have suggested I should also read the DNA backwards, which is what I'll certainly do. That's easy with perl:

    #!/usr/bin/perl # # Print a file backwards # open( FILE, "test.txt" ) or die( "Can't open file file_to_reverse: $!" ); @lines = reverse <FILE>; foreach $line (@lines) { $line = reverse $line; print $line; }

    The claim that you can find any text in DNA, the Communist Manifesto or the entire works of Shakespeare is simply not true. When looking for a pattern, you simply can't find any text you want. You can prove it by writing a script that finds the entire works of Shakespeare in DNA. That should be easy to do in perl. You are not allowed to use a one time encryption pad, that is simply cheating. No algorithm can find the entire works of Shakespeare in DNA, no matter what encoding is used.

    Some have suggested I should convert the DNA to bits and look there. That is a great idea, and I already wrote a script that converts DNA to bits. It reads A and T as a one and C and G as a zero and then adds up 7 bits at a time and outputs ascii characters. The challenge is to then find information in those characters which I have not yet done. Here's my script so far, I'm now using strict and warnings, learning all the time.

    #!/usr/bin/perl # # Convert DNA to bits and see what comes out # use strict; use warnings; open my $fh, "<:encoding(UTF-8)", "Homo_sapiens.GRCh38.dna.chromosome. +2.fa" or die "$!\n"; # open my $fh, "<:encoding(UTF-8)", "test.txt" or die "$!\n"; my $bit = "0"; my $byte = ""; my $i = 0; while (read($fh, my $char, 1)) { # ignore all other characters, escpecially those annoying NNNNNNNN, + what are they anyway? if ($char =~ m/[ACGT]/ ) { # convert bases to bits, 6 possible way to do this if ($char eq "A") { $bit = "0"; } if ($char eq "C") { $bit = "1"; } if ($char eq "G") { $bit = "0"; } if ($char eq "T") { $bit = "1"; } # add one bit at a time up to a byte if ($i < 7) { $i++; $byte .= $bit; } else { $i = 0; # convert the byte to a string my $chars = length($byte); my @packArray = pack("B$chars",$byte); my $print = "@packArray"; # only print alphanumeric characters if ($print =~/[a-z]|[A-Z]/) { print "$print"; } $byte = ""; } } } print "\n";
      The claim that you can find any text in DNA, the Communist Manifesto or the entire works of Shakespeare is simply not true.
      I believe you are right there. The downside is that I also believe that you won't find a large part of the bible either.

      And reading a file backwards is not quite as simple as simply calling reverse, because what you do in your example reads the whole file into memory and that will not work with very large files.

      And as you still seem to be in the brainstorming phase:

      You probably know that only a small part of the dna is actually encoding proteins, so you may consider restricting your search to these areas (or maybe a restrict your search to the non-coding parts).

      Lots of parameters you can tweak, so you'll bound to find something...

        OK thanks for letting me know that that script reads the whole file into memory. What would be a good solution to reverse each FASTA file, they are about 200MB each, without crashing my computer?

        Only a small part of DNA codes for proteins, but why the assumption that God would have coded the Bible in those regions? He could have done that anywhere is my guess. But there is no way of knowing how the mind of God works.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1214442]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2018-07-20 09:15 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (427 votes). Check out past polls.