|Welcome to the Monastery|
Perl uses for Cryptograms - Part 1: One-liners and Word Patternsby goibhniu (Hermit)
|on Sep 04, 2007 at 03:16 UTC||Need Help??|
Simple Substitution Ciphers
Following is a simple substitution cipher found in the September-October, 2006 issue of the Cryptogram:
As mentioned in Perl uses for Cryptograms - Part 0: Introduction (low Perl content), the American Cryptogram Association calls this an "Aristocrat", and this is cryptogram A-1 from that issue. How are we to solve this cipher?
My first approach is to look for patterns in the words that suggest plaintext. "HAXX" suggests "tell" to me. "UTAK" and "UTYU" suggest "then" and "that". I usually paste the cryptogram into BION's (an ACA nom) Solve a Cipher page and start trying out the possibilities one letter at a time. If you'd like a Perl solution, fellow monk graff's Perl/Tk GUI serves a similar purpose.
Perl is better at this than I am
What really helps, though is Perl's regular expressions. If I'm struck by temporary mild aphasia and am at a loss for what word might work for a pattern, I can use Perl to find one for me. The ACA has on it's site a set of word lists. I've downloaded and used regularly Donald Knuth's list.
The word list is smiply a file with words separated by line-feeds. If I'm looking for other words that could match "HAXX", I would construct a simple on-the-fly regex to filter my word list by and type it into a Perl one-liner like this (on Windows):
The parenthesis tell the regexp to remember this letter, and the '\1' tells it to put that remembered letter back in; thus capturing the double 'X' in "HAXX". This produces a disappointingly large list of 185 words. Dissappointing results like this and the fact that I've done this so often have led me to stick my one-liner in a batch file:
Some of the words returned above don't actually match what I'm looking for (of course they match what I told Perl to look for). For instance,
Punctuation would be preserved in your basic Aristocrat, so "I'll" is out. The ACA marks proper nouns with asterisks ('*'). "HAXX" is not marked with an asterisk above, so "Barr" and "Dodd" are out. Also, 'H' and 'X' cannot stand for the same letter in a simple substitution cipher, so "bibb and "lull" are out
What's needed is better regexp patterns. Getting rid of the proper nouns and punctuation is a simple matter of replacing '.', which means "any character" with a character class. I filter out uppercase letters and punctuation by using the character class '[a-z]' like this (still using a one liner here, but I also use my batch file):
This filters out most of the problem words, but what about "bibb", "lull" and the like? The single most commonly used (by me) regexp trick is the "Zero-width negative lookahead assertion". Using this, I can tell the rexep pattern what a character is NOT. Here's the example of "HAXX":
The first thing I've done is put parenthesis around every character so I can refer back to them. Also, in between all the '([a-z])'s, there are the parts that have question marks in them. Those are the negative lookahead assertions.
Better word selection
This still only whittles the list down to 97 entries. At this point I usually try to find better CipherText (CT) words to search on. At first I thought that non-pattern words (words with no repeating characters) would be the solution for fewer results. Here are some various length non-pattern words and the number of hits I get from words.knu, along with the corresponding regexp*:
As you can see, the number of results isn't quite the restricted list you might hope for. I've learned that words with patterns return much shorter lists. The most promising seems to me to be "AJBADOAPLA". Reverting to the simplest pattern,
This list is short enough that I can just use my eyeballs see that "effervesce" and "excellence" would be filtered out if I used zero-width negative lookahead assertions.
Working it out
Sticking "expe-ience" into either BION's or graff's tool yields (along with some more intelligent guesses):
Going back to BION's or graff's tools,
I hope you have fun with this. If you want more Aristocrats to try, I'd first recommend BION's Solve a Cipher page (select New Puzzle, then click Go!), then look into the ACA. I'd love to hear what kind of word patterns and one-liners you come up with to help solve simple ciphers.
update: 04-Sep-2007 slight readability changes
I humbly seek wisdom.