Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Perl uses for Cryptograms - Part 1: One-liners and Word Patterns

by goibhniu (Hermit)
on Sep 04, 2007 at 03:16 UTC ( #636818=CUFP: print w/replies, xml ) Need Help??

Simple Substitution Ciphers

Following is a simple substitution cipher found in the September-October, 2006 issue of the Cryptogram:


As mentioned in Perl uses for Cryptograms - Part 0: Introduction (low Perl content), the American Cryptogram Association calls this an "Aristocrat", and this is cryptogram A-1 from that issue. How are we to solve this cipher?

My first approach is to look for patterns in the words that suggest plaintext. "HAXX" suggests "tell" to me. "UTAK" and "UTYU" suggest "then" and "that". I usually paste the cryptogram into BION's (an ACA nom) Solve a Cipher page and start trying out the possibilities one letter at a time. If you'd like a Perl solution, fellow monk graff's Perl/Tk GUI serves a similar purpose.

Perl is better at this than I am

What really helps, though is Perl's regular expressions. If I'm struck by temporary mild aphasia and am at a loss for what word might work for a pattern, I can use Perl to find one for me. The ACA has on it's site a set of word lists. I've downloaded and used regularly Donald Knuth's list.

The word list is smiply a file with words separated by line-feeds. If I'm looking for other words that could match "HAXX", I would construct a simple on-the-fly regex to filter my word list by and type it into a Perl one-liner like this (on Windows):

perl -ne "print if m/^..(.)\1$/;" Words.knu

The parenthesis tell the regexp to remember this letter, and the '\1' tells it to put that remembered letter back in; thus capturing the double 'X' in "HAXX". This produces a disappointingly large list of 185 words. Dissappointing results like this and the fact that I've done this so often have led me to stick my one-liner in a batch file:

@ECHO OFF perl -ne "print if m/^%1$/;" Words.knu

Some of the words returned above don't actually match what I'm looking for (of course they match what I told Perl to look for). For instance,


Punctuation would be preserved in your basic Aristocrat, so "I'll" is out. The ACA marks proper nouns with asterisks ('*'). "HAXX" is not marked with an asterisk above, so "Barr" and "Dodd" are out. Also, 'H' and 'X' cannot stand for the same letter in a simple substitution cipher, so "bibb and "lull" are out

Better Patterns

What's needed is better regexp patterns. Getting rid of the proper nouns and punctuation is a simple matter of replacing '.', which means "any character" with a character class. I filter out uppercase letters and punctuation by using the character class '[a-z]' like this (still using a one liner here, but I also use my batch file):

perl -ne "print if m/^[a-z][a-z]([a-z])\1$/;" Words.knu

This filters out most of the problem words, but what about "bibb", "lull" and the like? The single most commonly used (by me) regexp trick is the "Zero-width negative lookahead assertion". Using this, I can tell the rexep pattern what a character is NOT. Here's the example of "HAXX":

perl -ne "print if m/^([a-z])(?!\1)([a-z])(?!\1)(?!\2)([a-z])\1$/;" Wo +rds.knu

The first thing I've done is put parenthesis around every character so I can refer back to them. Also, in between all the '([a-z])'s, there are the parts that have question marks in them. Those are the negative lookahead assertions.

says "give me a character between 'a' and 'z' that is not followed by the thing in the first set of parentheses". So if the first character is 'b' as in "bibb", then the second character can't be 'b'.
says "give me a character between 'a' and 'z' that is not followed by the thing in the first set of parentheses or the thing in the second set of parentheses", so the third letter can't be the second letter OR the first letter. If I've found a word starting with "bi", then the third letter can't be 'b' and also can't be 'i'.

Better word selection

This still only whittles the list down to 97 entries. At this point I usually try to find better CipherText (CT) words to search on. At first I thought that non-pattern words (words with no repeating characters) would be the solution for fewer results. Here are some various length non-pattern words and the number of hits I get from words.knu, along with the corresponding regexp*:
Nbr Chars Nbr words Regexp*
2 73
([a-z]) (?!\1)([a-z])
3 661
([a-z]) (?!\1)([a-z]) (?!\1)(?!\2)([a-z])
4 2194
([a-z]) (?!\1)([a-z]) (?!\1)(?!\2)([a-z]) (?!\1)(?!\2)(?!\3)([a-z])
5 3729
([a-z]) (?!\1)([a-z]) (?!\1)(?!\2)([a-z]) (?!\1)(?!\2)(?!\3)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)([a-z])
6 4353
([a-z]) (?!\1)([a-z]) (?!\1)(?!\2)([a-z]) (?!\1)(?!\2)(?!\3)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)([a-z])
7 4209
([a-z]) (?!\1)([a-z]) (?!\1)(?!\2)([a-z]) (?!\1)(?!\2)(?!\3)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)(?!\6)([a-z])
8 2932
([a-z]) (?!\1)([a-z]) (?!\1)(?!\2)([a-z]) (?!\1)(?!\2)(?!\3)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)(?!\6)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)(?!\6)(?!\7)([a-z])
9 1616
([a-z]) (?!\1)([a-z]) (?!\1)(?!\2)([a-z]) (?!\1)(?!\2)(?!\3)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)(?!\6)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)(?!\6)(?!\7)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)(?!\6)(?!\7)(?!\8)([a-z])
* I've broken up the regexp for readability and to emphasize the progression for successively longer non-pattern words. They should be in one line if used in the, um, one-liners or batch file above. If used in a large script, they could remain multi-line for readability with the /x suffix.

As you can see, the number of results isn't quite the restricted list you might hope for. I've learned that words with patterns return much shorter lists. The most promising seems to me to be "AJBADOAPLA". Reverting to the simplest pattern,

perl -ne "print if m/^(.)..\1..\1..\1$/;" Words.knu

results in:

This list is short enough that I can just use my eyeballs see that "effervesce" and "excellence" would be filtered out if I used zero-width negative lookahead assertions.

Working it out

Sticking "expe-ience" into either BION's or graff's tool yields (along with some more intelligent guesses):

Going back to BION's or graff's tools,

ultimately producing:

I hope you have fun with this. If you want more Aristocrats to try, I'd first recommend BION's Solve a Cipher page (select New Puzzle, then click Go!), then look into the ACA. I'd love to hear what kind of word patterns and one-liners you come up with to help solve simple ciphers.

update: 04-Sep-2007 slight readability changes
update: 06-Sep-2007 broke up the regexp in the table for readability (thx, bobf)

I humbly seek wisdom.

Replies are listed 'Best First'.
Re: Perl uses for Cryptograms - Part 1: One-liners and Word Patterns
by BrowserUk (Pope) on Sep 04, 2007 at 03:49 UTC

    I thought those regex look vaguely familiar? See Re: Regex help for code that greatly simplifies their generation.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Wonderful! If you can be patient I'll present my approach to generating the regexp in a later installment.

      I humbly seek wisdom.
Re: Perl uses for Cryptograms - Part 1: One-liners and Word Patterns
by duggles (Acolyte) on Dec 06, 2007 at 23:12 UTC

    I discovered this posting from a google search and found it quite interesting. I too have been into solving cryptograms for quite a while. I'm just beginning to try to write a perl script that will solve one automatically. I've found lots of examples in other languages, but none in perl so I'm just trying to absorb as much as I can from a perl perspective and see if I can do it.

    After reading and studying your code snippets, I thought you might be interested in my approach to getting crypto help. I've been saving every cryptogram I've ever solved in a text file. I then run a very big, clumsy and poorly written perl script to create a number of text files with all the words contained in the quote file. Among the files is a patterns file. I use a format like "1=22=1=" for a word like "suppose" and "=======" (that should be 7 separate consecutive "=" marks) for any 7 letter word with no pattern. I also add the number of occurrences of each word to the pattern file so that after the file is sorted by word length, then by pattern, and finally in descending order by frequency, I can just do a simple grep for a pattern, and I get a list ordered by frequency of occurence, in normal speech (or at least as normal as you can get from a bunch of quotations... ;-). That way I can make my guesses in a fairly productive manner.

    At any rate, I appreciate what I've learned from your regex statements (regex is not my strongpoint) and thought you might be interested to know how I approach the pattern problem. After poking around on the site for a couple of days, I decided I'd join the monks.

    Thanks again!

    Life is short, but it's wide -- Chuck Pyle

      Welcome to Perlmonks.

      I like your approach very much. I've got a program I intend to post to this series (if I can clean it up and get some tuits) that approaches automating the solution. Honestly it scares me because I'm afraid I'll loose that intuitive touch and become too dependent on the computer. I really like that your approach uses the computer to improve your intuition in solving.

      I humbly seek wisdom.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://636818]
Approved by bingos
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2021-06-13 09:23 GMT
Find Nodes?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)

    Results (54 votes). Check out past polls.