Ok, I started with 2of12full.txt, removed anything with a capital letter (I'm assuming those are abbreviations or dupes) or non-alphabet characters (hyphenations, abbreviations, etc.), and removed all words of less than 2 characters or more than 8. I then ran this:
use strict;
use warnings;
$| = 1;
my (%wc1, %wc2, $c);
open (IN, '2-8-words.txt') || die;
while (<IN>) {
chomp; $_ = join '', sort split //;
$wc1{$_}++;
}
open (IN, '2-8-words.txt') || die;
while (<IN>) {
chomp; $_ = join '.*?', sort split //;
for my $s (keys %wc1) {
$wc2{$s}++ if $s =~ /$_/;
}
print '.' if ++$c % 1000 == 0;
} print "\n\n";
$c = 0;
for (sort { $wc2{$b} <=> $wc2{$a} } keys %wc2) {
print "$_ : $wc2{$_}\n";
last if ++$c == 50;
}
Output:
aeilprst : 239
aeilnpst : 225
aeloprst : 220
aeilnrst : 219
aceiprst : 216
aeilmnrt : 207
aelprsty : 207
acenorst : 202
aeiprsty : 201
acelprst : 201
adeinrst : 199
acehorst : 198
aegilnrt : 197
aeinorst : 197
aeinoprt : 195
aeilnort : 194
adeilort : 191
aceinrst : 190
aelmoprt : 188
adeinort : 187
aegimnrt : 185
aemprstu : 184
adeilmst : 184
aceloprt : 182
adeginrt : 182
aceilnrt : 180
aceilprt : 180
aceimrst : 180
aegilnst : 179
acehnrst : 176
abeinrst : 175
aceilmrt : 175
eimoprst : 175
aceinort : 173
adeimort : 172
aeilnrtv : 172
adelorst : 172
aehmoprt : 171
aceilprs : 171
aehloprt : 171
acdehort : 170
abelrstu : 170
adeiorst : 170
aeoprstt : 169
acehiprs : 169
abelopst : 168
ademnopr : 168
adeilmry : 168
aeiprstv : 168
aeghorst : 167
This should be 100% accurate with any given word list (of a-z only), though it does take a little while to run, and if anyone can think of a way to speed it up without compromising accuracy, by all means do so.