I did some more analysing the 2of12inf.txt dictionary and for words of maximum 8 characters, this is the maximum number of the same characters that can exist in any one word:
` a => 4,
b => 3,
c => 3,
d => 4,
e => 4,
f => 4,
g => 4,
h => 3,
i => 3,
j => 2,
k => 3,
l => 4,
m => 3,
n => 4,
o => 4,
p => 3,
q => 1,
r => 4,
s => 5,
t => 3,
u => 4,
v => 2,
w => 3,
x => 2,
y => 3,
z => 4,
`
In other words, choose all combinations of any 8 chars out of the following string: "`aaaabbbcccddddeeeeffffgggghhhiiijjkkkllllmmmnnnnoooopppqrrrrssssstttuuuuvvwwwxxyyyzzzz`" and run your dictionary against each of these and your search is exhaustive.Of course that is still about 2.14 E15 combinations to check, but many of these will be the same. So, perhaps someone with more mathematical skills than I can write an efficient generator for these combinations that does not have to go through them all to weed out the duplicates.
CountZero *A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity."* - The Tao of Programming, 4.1 - Geoffrey James
My blog: Imperial Deltronics
| [reply] [d/l] [select] |

` DB<288> sub fac {my $x=1; $x*=$_ for 2..$_[0]; $x}
DB<289> sub binom { my ($n,$k)=@_; fac($n)/(fac($n-$k)*fac($k)) }
DB<290> binom 26,8
=> 1562275
`
Reasoning is simple, it calculates all binary vectors of length 26 with exactly 8 `1`-bits.
But with duplications its more complicated, e.g. 4 out of "aabcd" is not (5 over 4)=5
`a bcd
abcd
aa cd
aab d
aabc
`
cause the first 2 solutions are identical.
Generating all combinations and filtering the unique once is normally not very clever, cause the overhead can be enormous.
` DB<292> length "aaaabbbcccddddeeeeffffgggghhhiiijjkkkllllmmmnnnnoooo
+pppqrrrrssssstttuuuuvvwwwxxyyyzzzz"
=> 86
DB<293> binom 86,8
=> "53060358690"
`
And I'm stuck finding a formula which calculates all unique solutions, but generating is easier, just don't allow bitvectors with "gaps" between identical letters:
so
` "aaaabbbc..." # pattern
"1000100...." # ok => ab...
"1100100...." # ok => aab...
"1001100.... # not ok => aab...
`
I will update this post with a loop generating all possibilities soon.
##### update
indeed L~R's number of 12461993 possibilities is correct
The following (non-optimized) code took 3 minutes to calculate them:
Cheers Rolf
( addicted to the Perl Programming Language)
| [reply] [d/l] [select] |

LanX,
The C code I wrote to calculate it is below. It finished nearly instantly. I was trying to determine how to write the rest of the code (could it fit in memory).
It also ended up being part of my final solution since I realized it was unnecessary to keep them in memory using a high water mark algorithm.
| [reply] [d/l] |

`(n + (k - 1))!
--------------
k! (n - 1)!
33!
-------- = 13_884_156
8! * 25!
`
I am not sure how to get from 13_884_156 to 12_461_993 strictly through calculation but it seems like it should be possible.
| [reply] [d/l] |

CountZero,
*
I did some more analysing the 2of12inf.txt dictionary and for words of maximum 8 characters, this is the maximum number of the same characters that can exist in any one word:*
I had already done that which is where my math was coming from.
*
Of course that is still about 2.14 E15 combinations to check, but many of these will be the same. So, perhaps someone with more mathematical skills than I can write an efficient generator for these combinations that does not have to go through them all to weed out the duplicates.
*
Have you looked at my code? I already indicated that doing this reduces the number of 8 letter words that needs to be checked down to 12,461,993. The reason it grows back up to 510,106,759,469 loops/checks is because you still have to check each one of the 12.5 million against the 40K actual words. I was able to do this using C in 12 minutes.
| [reply] |

Your C-skills are legendary.
CountZero *A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity."* - The Tao of Programming, 4.1 - Geoffrey James
My blog: Imperial Deltronics
| [reply] |