in reply to Re^3: Challenge: 8 Letters, Most Words in thread Challenge: 8 Letters, Most Words
CountZero,
I am going to write a brute force solution this weekend but your math isn't quite right. First, you don't need all 26^8th since order doesn't matter. The upper bound is less than 5.2 million. We can reduce it further if we consider the maximum number of times a given letter can repeat. We do not have to consider a worstcase scenario dictionary so I think I can get that 5.2 million down to around 3 million. We still need to compare each of those 3 million against tens of thousands of words so I will be using C to do it but I think run time will be much less than 24 hours.
Update: My math was off somewhere. The upper bound was 13_884_156 and I was only able to reduce it down to 12_461_993. Since my word list contains 40,933 words (filtered down from 81_536), that means I am up to 510_106_759_469 checks. In order to finish in under 24 hours, I will need to be able to do 5.9 million checks per second. I hope my C skills are up for the challenge (stay tuned).
Re^5: Challenge: 8 Letters, Most Words
by CountZero (Bishop) on Oct 05, 2013 at 08:27 UTC

I did some more analysing the 2of12inf.txt dictionary and for words of maximum 8 characters, this is the maximum number of the same characters that can exist in any one word:
a => 4,
b => 3,
c => 3,
d => 4,
e => 4,
f => 4,
g => 4,
h => 3,
i => 3,
j => 2,
k => 3,
l => 4,
m => 3,
n => 4,
o => 4,
p => 3,
q => 1,
r => 4,
s => 5,
t => 3,
u => 4,
v => 2,
w => 3,
x => 2,
y => 3,
z => 4,
In other words, choose all combinations of any 8 chars out of the following string: "aaaabbbcccddddeeeeffffgggghhhiiijjkkkllllmmmnnnnoooopppqrrrrssssstttuuuuvvwwwxxyyyzzzz" and run your dictionary against each of these and your search is exhaustive.Of course that is still about 2.14 E15 combinations to check, but many of these will be the same. So, perhaps someone with more mathematical skills than I can write an efficient generator for these combinations that does not have to go through them all to weed out the duplicates.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity."  The Tao of Programming, 4.1  Geoffrey James My blog: Imperial Deltronics
 [reply] [d/l] [select] 

DB<288> sub fac {my $x=1; $x*=$_ for 2..$_[0]; $x}
DB<289> sub binom { my ($n,$k)=@_; fac($n)/(fac($n$k)*fac($k)) }
DB<290> binom 26,8
=> 1562275
Reasoning is simple, it calculates all binary vectors of length 26 with exactly 8 1bits.
But with duplications its more complicated, e.g. 4 out of "aabcd" is not (5 over 4)=5
a bcd
abcd
aa cd
aab d
aabc
cause the first 2 solutions are identical.
Generating all combinations and filtering the unique once is normally not very clever, cause the overhead can be enormous.
DB<292> length "aaaabbbcccddddeeeeffffgggghhhiiijjkkkllllmmmnnnnoooo
+pppqrrrrssssstttuuuuvvwwwxxyyyzzzz"
=> 86
DB<293> binom 86,8
=> "53060358690"
And I'm stuck finding a formula which calculates all unique solutions, but generating is easier, just don't allow bitvectors with "gaps" between identical letters:
so
"aaaabbbc..." # pattern
"1000100...." # ok => ab...
"1100100...." # ok => aab...
"1001100.... # not ok => aab...
I will update this post with a loop generating all possibilities soon.
update
indeed L~R's number of 12461993 possibilities is correct
The following (nonoptimized) code took 3 minutes to calculate them:
Cheers Rolf
( addicted to the Perl Programming Language)
 [reply] [d/l] [select] 

LanX,
The C code I wrote to calculate it is below. It finished nearly instantly. I was trying to determine how to write the rest of the code (could it fit in memory).
It also ended up being part of my final solution since I realized it was unnecessary to keep them in memory using a high water mark algorithm.
 [reply] [d/l] 




(n + (k  1))!

k! (n  1)!
33!
 = 13_884_156
8! * 25!
I am not sure how to get from 13_884_156 to 12_461_993 strictly through calculation but it seems like it should be possible.
 [reply] [d/l] 




CountZero,
I did some more analysing the 2of12inf.txt dictionary and for words of maximum 8 characters, this is the maximum number of the same characters that can exist in any one word:
I had already done that which is where my math was coming from.
Of course that is still about 2.14 E15 combinations to check, but many of these will be the same. So, perhaps someone with more mathematical skills than I can write an efficient generator for these combinations that does not have to go through them all to weed out the duplicates.
Have you looked at my code? I already indicated that doing this reduces the number of 8 letter words that needs to be checked down to 12,461,993. The reason it grows back up to 510,106,759,469 loops/checks is because you still have to check each one of the 12.5 million against the 40K actual words. I was able to do this using C in 12 minutes.
 [reply] 

Your Cskills are legendary.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity."  The Tao of Programming, 4.1  Geoffrey James My blog: Imperial Deltronics
 [reply] 

