|Perl: the Markov chain saw|
Turning A Problem Upside Downby Limbic~Region (Chancellor)
|on Aug 20, 2009 at 23:10 UTC||Need Help??|
Recently, I have been writing algorithms to play games on FaceBook. This is the real game for me and I often explore hunches to see if they teach me anything. The most recent game is called Word Twister and you are presented with 7 randomly selected characters (dups allowed). The object is to come up with as many words comprised of 3 or more of those characters within a certain time-frame.
A naive approach to this might be:
A smarter approach to this might be:
There are obvious optimizations to be had such as pre-processing the dictionary and loading it in the hash form rather than creating it each time. If you don't look too closely, the algorithm scales based on the number of characters in the dictionary and the the number of characters in the given string.
I wanted to see if I could do better. My hunch told me there should be a way to
I found that the answer was yes if you set a limit on the maximum length of input you would accept. Here is a brief explanation of the math:
What this does is count the total occurrences of each letter (order isn't important) and allow you to represent it as a single number. For instance, if max = 5 and your alphabet was only A .. F:
Ok, but how is the magic %is_subset constructed? It is simply the values of the powerset. Wait a minute I said. If I am generating the powerset why do I go through the hoops of converting to a number - why don't I just look up strings. Then I remember that in strings the order of letters is important (the naive brute force above). Ok, if I don't have to generate the permutations, this might be a practical approach.
I start looking at the math and I realize the approach without modification is not practical. To support a max string length of 7, the value of Z would be 1_564_580_056_274_625_717_608. The reason the numbers get so high so quickly is because you are summing maxN for N = 0 .. 25. Even if I am only doing subtraction, these numbers are too big to be working with and I want to be able to support strings larger than 7.
I started to think about a couple of ways that still may make this practical. The first was to not consider the general case (entire alphabet) but to work with only the characters in the given string. This was doomed to failure after I ran the numbers. First, you would have to generate all combinations of the various different string lengths you wanted to support. Remember the formula for N choose K is C = K! / N!(K - N)! So for N = 7, there are 657,800 unique combinations. You have to split the dictionary into that many files and then when you get your input string you can open the dictionary that only contains those letters (a smaller alphabet). You can see this solves one problem by creating a new one because you would need 1,562,275 additional files to support an input string of length 8. Of course, these needn't be actual files (database) but the problem is still not scalable and pre-processing will take forever.
The second approach was to go with a hybrid option. First, perform frequency analysis on the database to order the letters from most to least frequent. This means that instead of Z having the highest numerical value, Q would. We could then shave off the bottom X letters from the alphabet. The idea was to have two dictionary files (1 with words containing uncommon letters and 1 with only common letters). Once the input is examined, we dispatch to the straight forward brute forward approach with the alternate dictionary if an uncommon letter is observed and the weird math approach otherwise. After checking the numbers I realized it was still untenable.
What am I doing again?
Suddenly it dawns on me that a hybrid approach is what I want but not the one I was working on. The reason the naive brute force didn't work is because after generating the powerset, generating the permutations was necessary because order of letters in hash lookups. The weird math approach solved the order problem but created a new problem (numbers to big to work with). Could I eliminate the order of letters issue without using the weird math approach. Of course, just pre-process the dictionary by sorting the letters.
This is what I call turning a problem on its (?:head|side|ear). I look at it in a completely different way to see if I learn anything. In this case, I learned several ways not to solve the problem but that's ok - I had fun and this was the real game for me anyway. I just wonder if other people do this? If so, what approaches do you take and do you have any examples to share?
Cheers - L~R