hi Limbic~Region
IMHO all "heuristic" methods must have flows... I'm quite experienced with these kind of problems and people tend to underestimate them (I'm not smart just trained :).
And shouldn't be to difficult to show this belongs to NP-class.
Anyway since I was able to calculate all possible 8-letter-combinations within 3 minutes with an non-optimized recursion it should be possible to find an approach to rapidly calculate each covering-number and to choose the maximum.
Let me explain:
First of all a good branch-and-bound could avoid useless branches which can't possibly beat the current maximum (my recursion just needs more criteria to bound)
(this doesn't contradict NP, NP is about run-time and not correctness)²
In order to speed up calculation, one should cache sub-solutions which can be rapidly added.
Let l be a letter-combination to be checked and L-x all derived combinations by striking x letters and n(l) the number of dictionary-words which can be formed with exactly all letters.
its easy to see that the covering
C(l)= n(l)+ sum { C($_) } L-1 - sum { C($_) } L-2
e.g. C('abc') = n('abc') + C('ab')+C('ac')+C('bc') - C(a) -C(b)-C(c)
and with the table from this post ¹
1 a
1 b
3 a b
2 a c
2 b c
4 a b d
we see C('abc') = 0+3+2+2-(1+1+0) = 5 and indeed the first 5 entries in the table can be constructed out of a,b and c!
So to calculate the covering of an 8-letter tuple we just need to look up the covering of (at most) 8 7-letter sub-tuples and 28 6-letter sub-tuples and add them.
Starting with all 1-letter tuples one can successively calculate the covering of all 2-letter tuples and so on.
-max: 1 26
-max: 2 350
-max: 3 3247
-max: 4 23312
-max: 5 137909
-max: 6 698996
-max: 7 3116882
-max: 8 12461993
sum 16442715
Holding 16 million hash-entries shouldn't be a problem. A worst case of 591_937_740 (=16442715 *(28+8)) additions neither, which are already 1000 times less operations than in your C program.
This should be fast enough for 8 letters and I doubt one can do it faster...³
Implementation left as an exercise! :)
Cheers Rolf
( addicted to the Perl Programming Language)
¹) for simplification this table only holds ordered words, extending it to count different permutations of the same letters ("cab" and "abc" => n("abc") =2 ) wouldn't change the math!
²) and normally one can always construct an edge case where the recursion never bounds.
³) please note the complexity rises exponentially for more letters, it's still NP! |