I tried coming at this from another direction. Instead of looping through the possible letter combinations, I started with the word list and an empty array of "stacks" of words that could fit into the same set of 8 letters. For each word in the list, I checked it against each "stack," adding it to any stacks it could fit into, and starting a new stack of its own.
The first element in each stack is a list of the letters required to make all the words in that stack. So I'm able to compare against this one "word," rather than all the words in the stack each time. I could save some memory, and maybe some CPU, by not saving the words at all, just the number that fit in the stack, but I wanted to keep track of the words so I could verify that it's working right.
For each stack (sub-array), I do a quick check to see if the new word added to this stack would obviously require more than 8 letters, so it can bail out quickly if that's the case. Otherwise it does a more complex check to see what set of letters would be required to add this word to the stack, and does so if it fits.
In the end, it picks the largest stack. I ran it against the file 2of12.txt, skipping hyphenated words, lowering the case on everything, and removing duplicates, and it took almost exactly one hour on my system to process the remaining 21664 2-8 letter words. I'm rerunning it now against 2of12inf.txt, which I expect will take at least four times as long, maybe more. In the meantime, here's my code and results. I think there are probably some pretty big inefficiencies here, but I hope the method is interesting, at least.
bannor> cat 1056884.pl
#!/usr/bin/env perl
use Modern::Perl;
open my $wf, '<', '2of12.txt' or die $!;
my @words;
while (<$wf>) {
chomp;
push @words, lc($_) if length($_) > 1 and length($_) < 9 and ! /-/
+;
}
@words = keys { map { $_ => 1 } @words };
say "Working with @{[scalar @words]} words";
my @z;
for my $w (@words) {
my @letters = split //, $w;
for my $e (@z) {
my $quick = join '', sort split //, $w . $e->[0];
$quick =~ s/(.)\1+/$1/g;
next if length($quick) > 8; # bail out early if no chance
my $fill = $e->[0];
my $save = $fill;
my $new = '';
for my $l (@letters) {
unless( $fill =~ s/$l// ){
$new .= $l;
}
}
$save .= $new;
if ( length($save) < 9 ) {
push @$e, $w;
$e->[0] = $save;
}
}
push @z, [$w, $w]; # start a new stack with this word
}
my $e = ( sort { @$b <=> @$a } @z )[0];
say "Letters: ", shift @$e;
say "Number of words: ", scalar @$e;
say " $_" for sort @$e;
bannor> time perl 1056884.pl
Working with 21664 words
Letters: soartple
Number of words: 198
perl 1056884.pl 3628.31s user 0.14s system 99% cpu 1:00:34.69 total
## Words found:
ale, alert, aloe, alp, also, alter, alto, ape, apostle, apse, apt, are
+,
arose, art, arts, as, asp, aster, ate, atop, ear, earl, east, eat, eat
+s, era,
eta, la, lap, lapse, laser, last, late, later, lea, leap, leapt, lest,
+ let,
lo, lop, lope, lore, lose, loser, lost, lot, lots, oar, oat, ole, opal
+,
opera, opt, or, oral, orate, ore, pa, pal, pale, par, pare, parole, pa
+rse,
past, paste, pastel, pastor, pat, patrol, pea, peal, pear, peat, pelt,
+ per,
pert, peso, pest, pet, petal, petrol, plat, plate, plea, pleat, plot,
+pol,
polar, pole, polestar, pore, port, pose, poser, post, postal, poster,
+pot,
prate, presto, pro, pros, prose, rap, rape, rasp, rate, re, real, reap
+, rep,
repast, rest, roast, roe, role, rope, ropes, rose, rot, rote, sale, sa
+lt,
sap, sat, sate, sea, seal, sear, seat, sepal, septa, sera, slap, slat,
+ slept,
sloe, slop, slope, slot, so, soap, soar, sol, solar, sole, sop, sore,
+sort,
sorta, sot, spa, spar, spare, spat, spate, spear, spelt, splat, spore,
+ sport,
spot, sprat, stale, staple, stapler, star, stare, step, stop, store, s
+trap,
strep, strop, tale, tape, taper, taps, tar, tare, taro, tarp, tea, tea
+l,
tear, to, toe, tole, top, tops, tor, tore, trap, traps, trope, tsar
Update: Running my script on the larger 2of12inf.txt had the following results in 3.5 hours:
Working with 40933 words
Letters: primates
Number of words: 316
Words found:
aim aims air airs am amir amirs amp amps ape apes apse apt apter are a
+rise arm
armies armpit armpits arms art arts as asp aspire aster astir at ate e
+ar ears
east eat eats em emir emirs emit emits ems era eras erst esprit eta et
+as imp
impart imparts imps irate ire is ism it item items its ma map maps mar
+ mare
mares mars mart marts mas maser mast master mat mate mates mats me mea
+t meats
merit merits mesa met mi mire mires miser mist mister miter miters mit
+es mitre
mitres pa pair pairs par pare pares pars parse parties parts pas past
+paste
pastier pastime pat pate pates pats pea pear pears peas peat peats per
+ perm
permit permits perms pert pest pet pets pi piaster piastre pie pier pi
+ers pies
pirate pirates pis pit pita pitas pits praise pram prams prate prates
+pries
priest prim primate primates prime primes prise prism raise ram ramie
+ramies
ramp ramps rams rap rape rapes rapist raps rapt rasp rate rates rats r
+e ream
reams reap reaps rem remaps remit remits rems rep repast reps res rest
+ rim
rime rimes rims rip ripe ripest rips rise rite rites same sap sari sat
+ sate
satire sea seam sear seat semi sepia septa sera set simper sip sir sir
+e sit
sitar site smart smear smit smite spa spar spare spat spate spear sper
+m spire
spirea spit spite sprat sprite stair stamp stamper star stare steam st
+em step
stir strap stream strep stria striae strip stripe tam tame tamer tamer
+s tames
tamp tamper tampers tamps tams tape taper tapers tapes tapir tapirs ta
+ps tar
tare tares tarp tarps tars tarsi tea team teams tear tears teas temp t
+empi
temps term terms ti tie tier tiers ties time timer timers times tip ti
+ps tire
tires tis traipse tram tramp tramps trams trap traps trim trims trip t
+ripe
trips tsar
Aaron B.
Available for small or large Perl jobs; see my home node.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.