Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Challenge: 8 Letters, Most Words

by aaron_baugher (Deacon)
on Oct 05, 2013 at 03:41 UTC ( #1056996=note: print w/ replies, xml ) Need Help??


in reply to Challenge: 8 Letters, Most Words

I tried coming at this from another direction. Instead of looping through the possible letter combinations, I started with the word list and an empty array of "stacks" of words that could fit into the same set of 8 letters. For each word in the list, I checked it against each "stack," adding it to any stacks it could fit into, and starting a new stack of its own.

The first element in each stack is a list of the letters required to make all the words in that stack. So I'm able to compare against this one "word," rather than all the words in the stack each time. I could save some memory, and maybe some CPU, by not saving the words at all, just the number that fit in the stack, but I wanted to keep track of the words so I could verify that it's working right.

For each stack (sub-array), I do a quick check to see if the new word added to this stack would obviously require more than 8 letters, so it can bail out quickly if that's the case. Otherwise it does a more complex check to see what set of letters would be required to add this word to the stack, and does so if it fits.

In the end, it picks the largest stack. I ran it against the file 2of12.txt, skipping hyphenated words, lowering the case on everything, and removing duplicates, and it took almost exactly one hour on my system to process the remaining 21664 2-8 letter words. I'm rerunning it now against 2of12inf.txt, which I expect will take at least four times as long, maybe more. In the meantime, here's my code and results. I think there are probably some pretty big inefficiencies here, but I hope the method is interesting, at least.

bannor> cat 1056884.pl #!/usr/bin/env perl use Modern::Perl; open my $wf, '<', '2of12.txt' or die $!; my @words; while (<$wf>) { chomp; push @words, lc($_) if length($_) > 1 and length($_) < 9 and ! /-/ +; } @words = keys { map { $_ => 1 } @words }; say "Working with @{[scalar @words]} words"; my @z; for my $w (@words) { my @letters = split //, $w; for my $e (@z) { my $quick = join '', sort split //, $w . $e->[0]; $quick =~ s/(.)\1+/$1/g; next if length($quick) > 8; # bail out early if no chance my $fill = $e->[0]; my $save = $fill; my $new = ''; for my $l (@letters) { unless( $fill =~ s/$l// ){ $new .= $l; } } $save .= $new; if ( length($save) < 9 ) { push @$e, $w; $e->[0] = $save; } } push @z, [$w, $w]; # start a new stack with this word } my $e = ( sort { @$b <=> @$a } @z )[0]; say "Letters: ", shift @$e; say "Number of words: ", scalar @$e; say " $_" for sort @$e; bannor> time perl 1056884.pl Working with 21664 words Letters: soartple Number of words: 198 perl 1056884.pl 3628.31s user 0.14s system 99% cpu 1:00:34.69 total ## Words found: ale, alert, aloe, alp, also, alter, alto, ape, apostle, apse, apt, are +, arose, art, arts, as, asp, aster, ate, atop, ear, earl, east, eat, eat +s, era, eta, la, lap, lapse, laser, last, late, later, lea, leap, leapt, lest, + let, lo, lop, lope, lore, lose, loser, lost, lot, lots, oar, oat, ole, opal +, opera, opt, or, oral, orate, ore, pa, pal, pale, par, pare, parole, pa +rse, past, paste, pastel, pastor, pat, patrol, pea, peal, pear, peat, pelt, + per, pert, peso, pest, pet, petal, petrol, plat, plate, plea, pleat, plot, +pol, polar, pole, polestar, pore, port, pose, poser, post, postal, poster, +pot, prate, presto, pro, pros, prose, rap, rape, rasp, rate, re, real, reap +, rep, repast, rest, roast, roe, role, rope, ropes, rose, rot, rote, sale, sa +lt, sap, sat, sate, sea, seal, sear, seat, sepal, septa, sera, slap, slat, + slept, sloe, slop, slope, slot, so, soap, soar, sol, solar, sole, sop, sore, +sort, sorta, sot, spa, spar, spare, spat, spate, spear, spelt, splat, spore, + sport, spot, sprat, stale, staple, stapler, star, stare, step, stop, store, s +trap, strep, strop, tale, tape, taper, taps, tar, tare, taro, tarp, tea, tea +l, tear, to, toe, tole, top, tops, tor, tore, trap, traps, trope, tsar

Update: Running my script on the larger 2of12inf.txt had the following results in 3.5 hours:

Working with 40933 words Letters: primates Number of words: 316 Words found: aim aims air airs am amir amirs amp amps ape apes apse apt apter are a +rise arm armies armpit armpits arms art arts as asp aspire aster astir at ate e +ar ears east eat eats em emir emirs emit emits ems era eras erst esprit eta et +as imp impart imparts imps irate ire is ism it item items its ma map maps mar + mare mares mars mart marts mas maser mast master mat mate mates mats me mea +t meats merit merits mesa met mi mire mires miser mist mister miter miters mit +es mitre mitres pa pair pairs par pare pares pars parse parties parts pas past +paste pastier pastime pat pate pates pats pea pear pears peas peat peats per + perm permit permits perms pert pest pet pets pi piaster piastre pie pier pi +ers pies pirate pirates pis pit pita pitas pits praise pram prams prate prates +pries priest prim primate primates prime primes prise prism raise ram ramie +ramies ramp ramps rams rap rape rapes rapist raps rapt rasp rate rates rats r +e ream reams reap reaps rem remaps remit remits rems rep repast reps res rest + rim rime rimes rims rip ripe ripest rips rise rite rites same sap sari sat + sate satire sea seam sear seat semi sepia septa sera set simper sip sir sir +e sit sitar site smart smear smit smite spa spar spare spat spate spear sper +m spire spirea spit spite sprat sprite stair stamp stamper star stare steam st +em step stir strap stream strep stria striae strip stripe tam tame tamer tamer +s tames tamp tamper tampers tamps tams tape taper tapers tapes tapir tapirs ta +ps tar tare tares tarp tarps tars tarsi tea team teams tear tears teas temp t +empi temps term terms ti tie tier tiers ties time timer timers times tip ti +ps tire tires tis traipse tram tramp tramps trams trap traps trim trims trip t +ripe trips tsar

Aaron B.
Available for small or large Perl jobs; see my home node.


Comment on Re: Challenge: 8 Letters, Most Words
Select or Download Code
Re^2: Challenge: 8 Letters, Most Words
by Limbic~Region (Chancellor) on Oct 05, 2013 at 12:45 UTC
    aaron_baugher,
    I considered a bucket approach as well - but not quite the way you did it. What I was going to do was bucket all words of the same length and then at the end roll up 2 letter words into 3 letter words, 3 letter words into 4 letter words, etc. I realized this won't guarantee the best solution because of what I have explained elsewhere in this thread (taking some from one group and some from another can lead to the best solution)

    I ran your code against 2of12inf.txt (actually, the file I had already reduced). On my machine it took 2 hours and 3 minutes. Here is the output:

    Working with 40933 words Letters: primates Number of words: 316
    The correct answer has 346 words.

    Cheers - L~R

      Yes, I had it dump the complete array to a file the last time, and the winning set "painters" only gets 292 words from my script. I had a feeling it could miss some somehow, but I can't quite figure out how yet.

      Update: I see the problem now. Let's say "painters" is the 100th word, and "at" is word #2 and "not" is word #3. "not" gets added to bucket #2 along with "at" because it can fit, but now "painters" can't go into that bucket. And since "at" has already been placed in bucket #2 and possibly bucket #1, it can't be placed in bucket #100 with "painters" or in any of the buckets in between that "painters" might be in.

      So to make my approach work, I guess once all the words have been placed in at least one bucket, I would have to go back through them again, retrying them against each bucket past the one they started in. And even then I'm not sure you'd be guaranteed to get a bucket with the best possible combination.

      I had a nagging feeling there was a problem like that, but I needed a night's sleep to see it.

      Aaron B.
      Available for small or large Perl jobs; see my home node.

        I think all you need to do is to sort the words descending by length before you run through your loop. See my variation below which is very similar to yours Re^2: Challenge: 8 Letters, Most Words.

        The reason is that a longer word will never be added to the stack of a shorter word, so you do not need to check that.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1056996]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2014-07-31 07:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (245 votes), past polls