Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I tried coming at this from another direction. Instead of looping through the possible letter combinations, I started with the word list and an empty array of "stacks" of words that could fit into the same set of 8 letters. For each word in the list, I checked it against each "stack," adding it to any stacks it could fit into, and starting a new stack of its own.

The first element in each stack is a list of the letters required to make all the words in that stack. So I'm able to compare against this one "word," rather than all the words in the stack each time. I could save some memory, and maybe some CPU, by not saving the words at all, just the number that fit in the stack, but I wanted to keep track of the words so I could verify that it's working right.

For each stack (sub-array), I do a quick check to see if the new word added to this stack would obviously require more than 8 letters, so it can bail out quickly if that's the case. Otherwise it does a more complex check to see what set of letters would be required to add this word to the stack, and does so if it fits.

In the end, it picks the largest stack. I ran it against the file 2of12.txt, skipping hyphenated words, lowering the case on everything, and removing duplicates, and it took almost exactly one hour on my system to process the remaining 21664 2-8 letter words. I'm rerunning it now against 2of12inf.txt, which I expect will take at least four times as long, maybe more. In the meantime, here's my code and results. I think there are probably some pretty big inefficiencies here, but I hope the method is interesting, at least.

bannor> cat 1056884.pl #!/usr/bin/env perl use Modern::Perl; open my $wf, '<', '2of12.txt' or die $!; my @words; while (<$wf>) { chomp; push @words, lc($_) if length($_) > 1 and length($_) < 9 and ! /-/ +; } @words = keys { map { $_ => 1 } @words }; say "Working with @{[scalar @words]} words"; my @z; for my $w (@words) { my @letters = split //, $w; for my $e (@z) { my $quick = join '', sort split //, $w . $e->[0]; $quick =~ s/(.)\1+/$1/g; next if length($quick) > 8; # bail out early if no chance my $fill = $e->[0]; my $save = $fill; my $new = ''; for my $l (@letters) { unless( $fill =~ s/$l// ){ $new .= $l; } } $save .= $new; if ( length($save) < 9 ) { push @$e, $w; $e->[0] = $save; } } push @z, [$w, $w]; # start a new stack with this word } my $e = ( sort { @$b <=> @$a } @z )[0]; say "Letters: ", shift @$e; say "Number of words: ", scalar @$e; say " $_" for sort @$e; bannor> time perl 1056884.pl Working with 21664 words Letters: soartple Number of words: 198 perl 1056884.pl 3628.31s user 0.14s system 99% cpu 1:00:34.69 total ## Words found: ale, alert, aloe, alp, also, alter, alto, ape, apostle, apse, apt, are +, arose, art, arts, as, asp, aster, ate, atop, ear, earl, east, eat, eat +s, era, eta, la, lap, lapse, laser, last, late, later, lea, leap, leapt, lest, + let, lo, lop, lope, lore, lose, loser, lost, lot, lots, oar, oat, ole, opal +, opera, opt, or, oral, orate, ore, pa, pal, pale, par, pare, parole, pa +rse, past, paste, pastel, pastor, pat, patrol, pea, peal, pear, peat, pelt, + per, pert, peso, pest, pet, petal, petrol, plat, plate, plea, pleat, plot, +pol, polar, pole, polestar, pore, port, pose, poser, post, postal, poster, +pot, prate, presto, pro, pros, prose, rap, rape, rasp, rate, re, real, reap +, rep, repast, rest, roast, roe, role, rope, ropes, rose, rot, rote, sale, sa +lt, sap, sat, sate, sea, seal, sear, seat, sepal, septa, sera, slap, slat, + slept, sloe, slop, slope, slot, so, soap, soar, sol, solar, sole, sop, sore, +sort, sorta, sot, spa, spar, spare, spat, spate, spear, spelt, splat, spore, + sport, spot, sprat, stale, staple, stapler, star, stare, step, stop, store, s +trap, strep, strop, tale, tape, taper, taps, tar, tare, taro, tarp, tea, tea +l, tear, to, toe, tole, top, tops, tor, tore, trap, traps, trope, tsar

Update: Running my script on the larger 2of12inf.txt had the following results in 3.5 hours:

Working with 40933 words Letters: primates Number of words: 316 Words found: aim aims air airs am amir amirs amp amps ape apes apse apt apter are a +rise arm armies armpit armpits arms art arts as asp aspire aster astir at ate e +ar ears east eat eats em emir emirs emit emits ems era eras erst esprit eta et +as imp impart imparts imps irate ire is ism it item items its ma map maps mar + mare mares mars mart marts mas maser mast master mat mate mates mats me mea +t meats merit merits mesa met mi mire mires miser mist mister miter miters mit +es mitre mitres pa pair pairs par pare pares pars parse parties parts pas past +paste pastier pastime pat pate pates pats pea pear pears peas peat peats per + perm permit permits perms pert pest pet pets pi piaster piastre pie pier pi +ers pies pirate pirates pis pit pita pitas pits praise pram prams prate prates +pries priest prim primate primates prime primes prise prism raise ram ramie +ramies ramp ramps rams rap rape rapes rapist raps rapt rasp rate rates rats r +e ream reams reap reaps rem remaps remit remits rems rep repast reps res rest + rim rime rimes rims rip ripe ripest rips rise rite rites same sap sari sat + sate satire sea seam sear seat semi sepia septa sera set simper sip sir sir +e sit sitar site smart smear smit smite spa spar spare spat spate spear sper +m spire spirea spit spite sprat sprite stair stamp stamper star stare steam st +em step stir strap stream strep stria striae strip stripe tam tame tamer tamer +s tames tamp tamper tampers tamps tams tape taper tapers tapes tapir tapirs ta +ps tar tare tares tarp tarps tars tarsi tea team teams tear tears teas temp t +empi temps term terms ti tie tier tiers ties time timer timers times tip ti +ps tire tires tis traipse tram tramp tramps trams trap traps trim trims trip t +ripe trips tsar

Aaron B.
Available for small or large Perl jobs; see my home node.


In reply to Re: Challenge: 8 Letters, Most Words by aaron_baugher
in thread Challenge: 8 Letters, Most Words by Limbic~Region

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2024-04-24 22:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found