in reply to Re: Breaking the indecipherable cipher, courtesy Charles Babbage.
in thread Breaking the indecipherable cipher, courtesy Charles Babbage.

You're absolutely correct - the cipher text must be long enough and in a recognizable langauge, because my code looks for repeated substrings at some interval, and compares the letter fequency distribution of every Nth letter to the standard letter frequency distribution for a given langauge (English in my case ;). In fact it looks for multiple repeating substrings to better try to zero in on the likely key length. Once repeated substrings are found, it computes the greatest common factor of the intervals for a given repeat, and then computes the greatest common factor for those greatest comman factors, and uses that as the key length.

That doesn't always work :)

I've had mixed results - if I rename to just "chargrill" and encode the same text, the above code can't find any common factors, and then fails. It would actually take a fair amount of refactoring to account for this.

Honestly, I think my algorithm for figuring out what key letter would make an encrypted cipher string of every Nth character matches the standard letter frequency distribution is the most ... interesting. I couldn't think of a better way to do it than with the human eye, and I think my algorithm gets as close to a visual inspection of peaks and valleys of letter frequency charts as I could think of...

$,=42;for(34,0,-3,9,-11,11,-17,7,-5){$*.=pack'c'=>$,+=$_}for(reverse s +plit//=>$* ){$%++?$ %%2?push@C,$_,$":push@c,$_,$":(push@C,$_,$")&&push@c,$"}$C[$# +C]=$/;($#C >$#c)?($ c=\@C)&&($ C=\@c):($ c=\@c)&&($C=\@C);$%=$|;for(@$c){print$_^ +$$C[$%++]}
  • Comment on Re^2: Breaking the indecipherable cipher, courtesy Charles Babbage.
  • Download Code

Replies are listed 'Best First'.
Re^3: Breaking the indecipherable cipher, courtesy Charles Babbage.
by turo (Friar) on May 19, 2006 at 19:26 UTC

    your algorithm works well your only fail is to feed your vigener cipher machine with simbols others than alphabetics ones.

    Problems of keeping spaces on your ciphertext are that there are some words that are very common on your language (examples: the, and, of, I, you ...), and the same goes for other simbols like the ``''' and the ``?''. So in old cipher methods is important to remove this simbols (except on transposition ciphers like the scitale ^_^>)

    i think your algorithm failed because of those spureous simbols

    prove with this text instead:

    TheVigenerecipherknownbysomeaslechiffreindechiffrableFrenchfor theindecipherablecipherisamethodofencryptionthatusesaseriesof differentCaesarciphersbasedonthelettersofakeywordthoughthereis someargumentamongcryptographiccirclesastowhichincarnationofthis particularpolyalphabeticsubstitutioncanaccuratelybeattributedtoBlaise deVigenereThisimplementationisasimpleformofapolyalphabetic substitutionToencipheratableofalphabetscanbeusedtermedatabula rectaorVigeneretableItconsistsofthealphabetwrittenout26times indifferentrowseachalphabetshiftedcycliclytotheleftcomparedtothe previousalphabetIncidentallytheresagoodchancethatthisplaintext islongenoughtodisplaythecryptographicalweaknessofthisparticular algorithmCanyouspotit

    perl -Te 'print map { chr((ord)-((10,20,2,7)[$i++])) } split //,"turo"'

      Yeah - perfect example: The following text, taken from Simon Singh's Code Challenge:

      ... fails miserably, but if I save it as "trajan" and run the following:

      $ cat trajan | perl -pe 's/\W|\s//gm' | perl

      It works like a champ. But this treatment can also foil my "best guess" as to the key length:

      $ tail -12 | perl -pe 's/\W|\s//gm' \ | perl - | perl

      ... the "key" found is "vrgpn", and the plaintext is then shown as "tyekipelepetiehnrinmwebnsxmcaqlvcwio..."

      All of which makes me think I need to do some refactoring to look out for these kinds of problems, or at the very least add some interactivity along the lines of the posted solution to the code challenge mentioned above.

      $,=42;for(34,0,-3,9,-11,11,-17,7,-5){$*.=pack'c'=>$,+=$_}for(reverse s +plit//=>$* ){$%++?$ %%2?push@C,$_,$":push@c,$_,$":(push@C,$_,$")&&push@c,$"}$C[$# +C]=$/;($#C >$#c)?($ c=\@C)&&($ C=\@c):($ c=\@c)&&($C=\@C);$%=$|;for(@$c){print$_^ +$$C[$%++]}

        automation is right ;) ...

        i did the same as you did, but i've obtained the correct result ...

        turo@somewhere:~# tail -12 | perl -pe 's/\W|\s//gm' | pe +rl - | perl Key: vigenerepl Plaintext: thevigenerecipherknownbysomeaslechiffreindechiffrable ...

        The more bizarre thing, is that if i rename the file to chargrill and do the same, the result is wrong

        turo@somewhere:~# tail -12 chargrill | perl -pe 's/\W|\s//gm' | perl + chargrill - | perl Key: a Plaintext: voemoxmyptlczvymcvpvwehpazxghscktptqhyeztumnskmfigstpqtl ...

        Maybe the solution is to choose a keylengh (and key), get the result for the bests fits, and pass some test to see if the result is right or not (based on letter frequencies) ... or something like that


        PS: only one more question, which book did you bought? (I've got the David Khan's Codebreakers; but is too big) ... Is the Sinmon Singh's one? ... woff, woff

        perl -Te 'print map { chr((ord)-((10,20,2,7)[$i++])) } split //,"turo"'