Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Challenge: 8 Letters, Most Words

by Limbic~Region (Chancellor)
on Oct 04, 2013 at 14:13 UTC ( #1056884=perlquestion: print w/ replies, xml ) Need Help??
Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All,
A friend recently shared a puzzle that I thought you might enjoy.

You and 7 friends are going to a costume party dressed as letters of the alphabet. Which 8 letters should you choose in order to be able to form the most words.

I asked a few clarifying questions:
  • Does capitalization/punctuation matter? - No
  • Can more than one person be the same letter? - Yes
  • Are words with less than 8 letters allowed? - Yes
  • Are words with more than 8 letters allowed (marquee style)? - No
  • Is there an official dictionary? - No

Your challenge, if you choose to accept it, is to solve the puzzle (provide code, link to dictionary you used, 8 letter solution as well as the number of words that can be made). If you don't have a dictionary, I tend to use the 2of12inf from the Official 12Dicts Package.

Cheers - L~R

Comment on Challenge: 8 Letters, Most Words
Re: Challenge: 8 Letters, Most Words
by tobyink (Abbot) on Oct 04, 2013 at 15:35 UTC

    OK, this uses some fairly naive heuristics to narrow down the search field. I'm sure somebody can do better, though I'd be fairly surprised if anybody came up with an answer that differed from mine by more than 2 letters.

    use v5.14; use Sort::Key::Top 'nkeytopsort'; sub sort_letters { my @letters = split '', $_[0]; join '', sort @letters; } sub can_make { my ($letters, $word) = @_; $word = join '.*', sort split '', $word; $letters =~ qr{$word}; } say "Reading words from dictionary..."; open my $dict, '<', '/usr/share/dict/words'; chomp( my @WORDS = map lc, grep length($_) <= 9, <$dict> ); say "Scoring microcombinations..."; my %SCORE; $SCORE{ sort_letters($_) }++ for @WORDS; my @GOOD = nkeytopsort { $SCORE{$_} } -40 => keys %SCORE; say "Generating candidate answers..."; my %CANDIDATES; for my $x (@GOOD) { for my $y (@GOOD) { for my $z (@GOOD) { my $letters = sort_letters substr("$x$y$z", 0, 8); $CANDIDATES{$letters}++; } } } my @VERYGOOD = nkeytopsort { $SCORE{$_} } -40 => keys %CANDIDATES; say "Finding best candidate answer..."; my %RESULTS = map { my $letters = $_; my @can_make = grep can_make($letters, $_), @WORDS; say " $letters - ", scalar(@can_make); $letters => \@can_make; } @VERYGOOD; my ($BEST) = nkeytopsort { scalar @{$RESULTS{$_}} } -1 => @VERYGOOD; say "BEST: $BEST"; say for sort @{$RESULTS{$BEST}};

    The dictionary used was the standard /usr/share/dict/words supplied with Ubuntu 13.04.

    The best set of letters it found was "aeinprst" which could make 426 words. (The list appears to have some duplicates because some words appear in /usr/share/dict/words more than once with different cases.)

    use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name
      In the right typeface, if "a","n" and "p" can do handstands, you have (another) "e", "u" and "d" available, so it's really more then 426. Let a double for e, and add "h" who can also be "y" and this could get weird.

      Edit: if "p" makes the costume right, they can also be "b", "d" or "q". As far as I can tell, this doesn't violate the rules above.

      tobyink,
      This is a pretty good heuristic solution but as-is, it only gets the second best solution for 2of12inf.txt (aeilprst at 344 instead of aeinprst at 346).

      Cheers - L~R

Re: Challenge: 8 Letters, Most Words (Update2:Then again :)
by BrowserUk (Pope) on Oct 04, 2013 at 16:05 UTC

    Update2: Second guessing myself. Update: Amongst other possible errors, this rejects words that have more than 8 letters; whereas it should only reject words with more than 8 unique letters!

    I get:

    C:\test>1056884 words.txt a e i n o r s t: 458

    Just brute force (takes 3 or 4 seconds on my 179000 word dictionary):


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    div class= @{ $stats{ $a } }; } keys %stats; my %letters; ++$letters{ $_ } for @freq
      BrowserUk,
      This certainly doesn't look like brute force to me.

      First you get the words each letter appears in (regardless of how many times it appears). Then, you consider the top 8 letters based on how many words contain the letter but only allow each letter to appear once.

      Your solution would probably be a lot faster if you also threw out any word from the dictionary that had repeated letter since I can't see how you would ever consider them.

      Cheers - L~R

      Hi BrowserUK,

      while I'm looking at the solutions presented here, I tried to understand your solution knowing that you have often a different and interesting view of looking at the problem. I did a litte test case dict:

      ffffffff fffffffa afffffff bfffffff fffffffb ffffbfff

      With this dict the letters to choose to get the most matching words is 'bfffffff'. You algorithm prints something different. What have I missed?

      Best regards
      McA

        you have often a different and interesting view of looking at the problem.

        How many dictionary words contain 7 repeats of the same letter? :)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Challenge: 8 Letters, Most Words
by LanX (Canon) on Oct 04, 2013 at 17:38 UTC
    (Tsts this Limbic~Region sabotaging our work-life balance again! ;)

    Some theory:

    For a minute I thought this can be trivially solved by counting the normalization of all words (<=8) from the dictionary in a hash... e.g.

    DB<211> join "",sort split //,"electron" => "ceelnort" DB<212> join "",sort split //,"elector" => "ceelort"

    As next step successively the count from all smaller words had to be added to covering words, e.g striking the "n" from "ceelnort" leads to "ceelort", so $count{ceelnort}+=$count{ceelort}

    But than I realized that the best covering word from the dictionary is not necessarily the best solution.

    take this counterexample for 3 out of 4 letters, the number showing the coverage-count

    1 a 1 b 3 a b 2 a c 2 b c 4 a b d

    so the word (a,b,d) is the maximum with a count 4, but the set (a,b,c) would cover 5 words!!!

    (yes this also works with repeated letters)

    IMHO this problem belongs to the family of Maximum coverage problem and Set_cover_problem, so finding a guarantied best solution shouldn't be trivial w/o brute force.

    OTOH adapting the sort order of letters might already lead to very good solutions...

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    update

    Maybe you can use the above count hash to solve the dual problem:

    "which of the n-8 letters cover the minimum of words" (n including repetition)

    E.g. "d" is in only one out of 6 words with 4 letters => the remaining 3 letters cover 5 words.

    "c" is only in 2 remaining words => (a,b) cover a maximum of 3 words and so on.

    Not sure if this leads to the guarantied best solution, sounds to easy... =)

      LanX,
      I couldn't convince myself that anything other than an exhaustive brute force solution would guarantee the correct answer. Now that I have done that, I will think about alternatives.

      Cheers - L~R

Re: Challenge: 8 Letters, Most Words
by Limbic~Region (Chancellor) on Oct 04, 2013 at 18:13 UTC
    All,
    I have an idea for an exhaustive solution. I haven't convinced myself it will work yet but it will require multiple phases and at least one of the phases will need to be written in C. I will update when I can play even if it turns out not to be viable.

    Update: Ok, I finally have something running that appears to work though I won't know for some time if it is correct. I ended up using 2 pieces of code. A short perl script and a C program that I will share despite the fact it is probably horrible.

    Perl script. Used to filter the dictionary as well as determine the maximum number of times each letter is repeated in a word.

    The C program. It uses a high water mark algorithm. To ensure it was working, I had it output every time the high water mark was reached or exceeded rather than waiting all the way until the end.

    I am sure there is a much better way of doing this. I will need to achieve over 5 million loops per second to achieve my goal of being completed within 24 hours. I kicked it off at 10:28PM EST. I will update again when it is completed.

    Update 2: It only took 12 minutes to run which worries me, but here is the output (winner at the bottom):

    Cheers - L~R

Re: Challenge: 8 Letters, Most Words (top 50)
by BrowserUk (Pope) on Oct 04, 2013 at 19:04 UTC

    Using a slightly different approach, this is the top 50 from my dictionary:

    aeilprst: 552 aeinprst: 528 aeilnrst: 520 adeilrst: 516 aeloprst: 515 aeilmrst: 499 aeilnpst: 497 adeoprst: 490 adeiprst: 487 aenoprst: 484 aeginrst: 484 adeinrst: 481 aeimnrst: 480 aceoprst: 479 aeimprst: 473 aehoprst: 471 adeilprs: 468 aceiprst: 465 abeilrst: 464 aefilrst: 462 acenorst: 462 aeinorst: 458 adeimrst: 454 acelorst: 452 aceinrst: 451 aceilrst: 451 aelprsty: 446 acelprst: 446 adeilpst: 446 abelorst: 443 adelorst: 443 adeilmrs: 441 aegilrst: 439 adeilnrs: 437 aehlorst: 437 aemnorst: 437 abdeilrs: 436 aeilmnst: 435 adeiorst: 435 aelmprst: 433 aeoprstu: 433 aegoprst: 433 einoprst: 432 aegilnst: 432 acehorst: 431 aehiprst: 431 aegnorst: 430 aeilmprs: 430 aeiklrst: 426 aeilnprs: 426

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    Using a slightly different approach, this is the top 50 from my dictionary:
      BrowserUk,
      Can you provide a link to your dictionary or, better yet, run your program against the dictionary I provided in the root of this thread? I am planning on writing a truly brute force solution and would like to have something to base it off of.

      Cheers - L~R

        Hm. L~R: I've sat on these results for an hour now (it takes about 2 hours to run) whilst I looked for errors. I haven't seen anything obvious.

        But, you should probably take them with a pinch of salt. I see the possibility for something wrong in these results, but it is too late, and takes too long to verify them tonight. I'll attempt verification tomorrow.

        However, the results are interesting enough that I felt you might like to see them. The following two sets of results are from my (179k) dictionary and the (81k) 2of12int.txt. These results do allow for set of 8 letters than include duplicates, though none are in the top 50.

        What is intriguing (and worrying) is that word counts are identical; but the charsets are not!

        my dictionary 2of12int.txt aeilprst: 552 aeilprst: 552 aeinprst: 528 aeinprst: 528 aeilnrst: 520 aeilnrst: 520 adeilrst: 516 adeilrst: 516 aeloprst: 515 aeloprst: 515 aeilmrst: 499 aeilmrst: 499 aeilnpst: 497 aeilnpst: 497 adeoprst: 490 adeoprst: 490 adeiprst: 487 adeiprst: 487 aenoprst: 484 aenoprst: 484 aeginrst: 484 aeginrst: 484 adeinrst: 481 adeinrst: 481 aeimnrst: 480 aeimnrst: 480 aceoprst: 479 aceoprst: 479 aeimprst: 473 aeimprst: 473 aehoprst: 471 aehoprst: 471 adeilprs: 468 adeilprs: 468 aceiprst: 465 aceiprst: 465 abeilrst: 464 abeilrst: 464 aefilrst: 462 acenorst: 462 acenorst: 462 aefilrst: 462 aeinorst: 458 aeinorst: 458 adeimrst: 454 adeimrst: 454 acelorst: 452 acelorst: 452 aceinrst: 451 aceinrst: 451 aceilrst: 451 aceilrst: 451 aelprsty: 446 aelprsty: 446 acelprst: 446 acelprst: 446 adeilpst: 446 adeilpst: 446 abelorst: 443 abelorst: 443 adelorst: 443 adelorst: 443 adeilmrs: 441 adeilmrs: 441 aegilrst: 439 aegilrst: 439 adeilnrs: 437 adeilnrs: 437 aehlorst: 437 aemnorst: 437 aemnorst: 437 aehlorst: 437 abdeilrs: 436 abdeilrs: 436 aeilmnst: 435 aeilmnst: 435 adeiorst: 435 adeiorst: 435 aelmprst: 433 aeoprstu: 433 aeoprstu: 433 aelmprst: 433 aegoprst: 433 aegoprst: 433 einoprst: 432 aegilnst: 432 aegilnst: 432 einoprst: 432 acehorst: 431 acehorst: 431 aehiprst: 431 aehiprst: 431 aegnorst: 430 aeilmprs: 430 aeilmprs: 430 aegnorst: 430 aeiklrst: 426 aeilnprs: 426 aeilnprs: 426 aeiklrst: 426

        I am convinced that the top result from both datasets is "the right answer"; despite the uncomfortable feeling from the concordance of the numbers and the discordance of the charsets. Tomorrow...


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Challenge: 8 Letters, Most Words
by CountZero (Bishop) on Oct 04, 2013 at 20:06 UTC
    A brute force approach, using the 2of12inf.txt dictionary.

    use Modern::Perl; open my $DICT, '<', './2of12inf.txt' or die "Could not open dictionary +: $!"; my @dictionary; my %words8; my %results; while (<$DICT>) { chomp; next unless $_; chop if /%$/; next if length > 8; my $sorted = join '', sort split ''; push @dictionary, $sorted; $words8{$sorted}++ if length == 8; } my $maxword = 0; my $solution = ''; for my $word8 ( keys %words8 ) { my $regex = join '?', split '', $word8; for my $word (@dictionary) { $results{$word8}++ if $word =~ /($regex?)/ and $1 eq $word; } if ( $results{$word8} > $maxword ) { $maxword = $results{$word8}; $solution = $word8; say "$solution can make $maxword words"; } } say "Finished";
    Output:
    aabdellu can make 55 words abellrsu can make 118 words aeikprrs can make 131 words ahopstuw can make 136 words ... acehprst can make 288 words adeinrst can make 313 words adeoprst can make 316 words aeilprst can make 344 words aeinprst can make 346 words Finished
    Corrected the "off-by-one" error.

    Re-corrected the off-by-one error which was due to an empty line in the small test dictionaries I used.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

      Hi CountZero,

      I'm pretty sure you have the wrong driver. Look at this dictionary:

      ffffffff fffffffa afffffff bfffffff fffffffb ffffbfff a aa aaa aaaa aaaaa aaaaaa

      The solution is clearly something with 6 x 'a'. Your program states something different.

      Best regards
      McA

        You are right, because I check against all 8 character words and 'aaaaaa' only has six and therefore it is not considered. To be absolutely certain one would have to check against all combinations of 8 characters, i.e. checking your whole dictionary against each of 208 827 064 576 combinations.

        So my solution answers the question with one extra condition: that the 8 characters together form a word too.

        And I seem to have an "off by one error" somewhere.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        My blog: Imperial Deltronics
      CountZero,
      What was the "off-by-one" error? Using the same dictionary file, I get aeinprst = 346 which I have manually verified. This isn't brute force (as you have indicated elsewhere) - it is heuristic but appears to come up with the correct result. While I like it, I needed to know for certain which is why I wrote an exhaustive solution which amazingly finished in 12 minutes.

      Cheers - L~R

        Well, the off-by-one error I assumed was in my code was actually in the small test dictionaries I ran, because of a spurious empty line at the end.

        It is quite funny that my brute force code finds the right solution even without going through all possible combinations. I guess I was just lucky.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        My blog: Imperial Deltronics
Re: Challenge: 8 Letters, Most Words
by TJPride (Pilgrim) on Oct 04, 2013 at 20:28 UTC
    Ok, I started with 2of12full.txt, removed anything with a capital letter (I'm assuming those are abbreviations or dupes) or non-alphabet characters (hyphenations, abbreviations, etc.), and removed all words of less than 2 characters or more than 8. I then ran this:
    use strict; use warnings; $| = 1; my (%wc1, %wc2, $c); open (IN, '2-8-words.txt') || die; while (<IN>) { chomp; $_ = join '', sort split //; $wc1{$_}++; } open (IN, '2-8-words.txt') || die; while (<IN>) { chomp; $_ = join '.*?', sort split //; for my $s (keys %wc1) { $wc2{$s}++ if $s =~ /$_/; } print '.' if ++$c % 1000 == 0; } print "\n\n"; $c = 0; for (sort { $wc2{$b} <=> $wc2{$a} } keys %wc2) { print "$_ : $wc2{$_}\n"; last if ++$c == 50; }
    Output:
    aeilprst : 239 aeilnpst : 225 aeloprst : 220 aeilnrst : 219 aceiprst : 216 aeilmnrt : 207 aelprsty : 207 acenorst : 202 aeiprsty : 201 acelprst : 201 adeinrst : 199 acehorst : 198 aegilnrt : 197 aeinorst : 197 aeinoprt : 195 aeilnort : 194 adeilort : 191 aceinrst : 190 aelmoprt : 188 adeinort : 187 aegimnrt : 185 aemprstu : 184 adeilmst : 184 aceloprt : 182 adeginrt : 182 aceilnrt : 180 aceilprt : 180 aceimrst : 180 aegilnst : 179 acehnrst : 176 abeinrst : 175 aceilmrt : 175 eimoprst : 175 aceinort : 173 adeimort : 172 aeilnrtv : 172 adelorst : 172 aehmoprt : 171 aceilprs : 171 aehloprt : 171 acdehort : 170 abelrstu : 170 adeiorst : 170 aeoprstt : 169 acehiprs : 169 abelopst : 168 ademnopr : 168 adeilmry : 168 aeiprstv : 168 aeghorst : 167
    This should be 100% accurate with any given word list (of a-z only), though it does take a little while to run, and if anyone can think of a way to speed it up without compromising accuracy, by all means do so.

      Hi,

      you print a list, but what is the 8 character sequence which solves the problem? Is it the first one?

      McA

        Well yes, of course.
      TJPride,
      You filtered things out that shouldn't have been (capitalization/punctuation don't matter for instance). Here is the code that I am using to filter my list:
      #!/usr/bin/perl use strict; use warnings; my %seen; open(my $fh, '<', '2of12inf.txt') or die "Unable to open '2of12inf.txt +' for reading: $!\n"; while (<$fh>) { chomp; $_ = lc($_); tr/a-z//cd; next if ! $_ || length($_) > 8 || $seen{$_}++; print "$_\n"; }

      This should be 100% accurate with any given word list

      I will be using the same dictionary with my brute force solution so I will let you know though I won't be filtering out all the words you have.

      Cheers - L~R

        The '2of12inf.txt' already has all these filtered out, but some words have a '%' added (I do not know why, but you will have to delete the '%').

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        My blog: Imperial Deltronics
        Capitalization may not matter, but dupes do, and there are often multiple versions of the same word. I suppose I could have just lowercased everything and then deduped. The word list is largely irrelevant, though - the important thing is the algorithm.

        What do you mean by brute force, exactly? There are far too many possible letter combinations to just check each one.

Re: Challenge: 8 Letters, Most Words
by aaron_baugher (Deacon) on Oct 05, 2013 at 03:41 UTC

    I tried coming at this from another direction. Instead of looping through the possible letter combinations, I started with the word list and an empty array of "stacks" of words that could fit into the same set of 8 letters. For each word in the list, I checked it against each "stack," adding it to any stacks it could fit into, and starting a new stack of its own.

    The first element in each stack is a list of the letters required to make all the words in that stack. So I'm able to compare against this one "word," rather than all the words in the stack each time. I could save some memory, and maybe some CPU, by not saving the words at all, just the number that fit in the stack, but I wanted to keep track of the words so I could verify that it's working right.

    For each stack (sub-array), I do a quick check to see if the new word added to this stack would obviously require more than 8 letters, so it can bail out quickly if that's the case. Otherwise it does a more complex check to see what set of letters would be required to add this word to the stack, and does so if it fits.

    In the end, it picks the largest stack. I ran it against the file 2of12.txt, skipping hyphenated words, lowering the case on everything, and removing duplicates, and it took almost exactly one hour on my system to process the remaining 21664 2-8 letter words. I'm rerunning it now against 2of12inf.txt, which I expect will take at least four times as long, maybe more. In the meantime, here's my code and results. I think there are probably some pretty big inefficiencies here, but I hope the method is interesting, at least.

    bannor> cat 1056884.pl #!/usr/bin/env perl use Modern::Perl; open my $wf, '<', '2of12.txt' or die $!; my @words; while (<$wf>) { chomp; push @words, lc($_) if length($_) > 1 and length($_) < 9 and ! /-/ +; } @words = keys { map { $_ => 1 } @words }; say "Working with @{[scalar @words]} words"; my @z; for my $w (@words) { my @letters = split //, $w; for my $e (@z) { my $quick = join '', sort split //, $w . $e->[0]; $quick =~ s/(.)\1+/$1/g; next if length($quick) > 8; # bail out early if no chance my $fill = $e->[0]; my $save = $fill; my $new = ''; for my $l (@letters) { unless( $fill =~ s/$l// ){ $new .= $l; } } $save .= $new; if ( length($save) < 9 ) { push @$e, $w; $e->[0] = $save; } } push @z, [$w, $w]; # start a new stack with this word } my $e = ( sort { @$b <=> @$a } @z )[0]; say "Letters: ", shift @$e; say "Number of words: ", scalar @$e; say " $_" for sort @$e; bannor> time perl 1056884.pl Working with 21664 words Letters: soartple Number of words: 198 perl 1056884.pl 3628.31s user 0.14s system 99% cpu 1:00:34.69 total ## Words found: ale, alert, aloe, alp, also, alter, alto, ape, apostle, apse, apt, are +, arose, art, arts, as, asp, aster, ate, atop, ear, earl, east, eat, eat +s, era, eta, la, lap, lapse, laser, last, late, later, lea, leap, leapt, lest, + let, lo, lop, lope, lore, lose, loser, lost, lot, lots, oar, oat, ole, opal +, opera, opt, or, oral, orate, ore, pa, pal, pale, par, pare, parole, pa +rse, past, paste, pastel, pastor, pat, patrol, pea, peal, pear, peat, pelt, + per, pert, peso, pest, pet, petal, petrol, plat, plate, plea, pleat, plot, +pol, polar, pole, polestar, pore, port, pose, poser, post, postal, poster, +pot, prate, presto, pro, pros, prose, rap, rape, rasp, rate, re, real, reap +, rep, repast, rest, roast, roe, role, rope, ropes, rose, rot, rote, sale, sa +lt, sap, sat, sate, sea, seal, sear, seat, sepal, septa, sera, slap, slat, + slept, sloe, slop, slope, slot, so, soap, soar, sol, solar, sole, sop, sore, +sort, sorta, sot, spa, spar, spare, spat, spate, spear, spelt, splat, spore, + sport, spot, sprat, stale, staple, stapler, star, stare, step, stop, store, s +trap, strep, strop, tale, tape, taper, taps, tar, tare, taro, tarp, tea, tea +l, tear, to, toe, tole, top, tops, tor, tore, trap, traps, trope, tsar

    Update: Running my script on the larger 2of12inf.txt had the following results in 3.5 hours:

    Working with 40933 words Letters: primates Number of words: 316 Words found: aim aims air airs am amir amirs amp amps ape apes apse apt apter are a +rise arm armies armpit armpits arms art arts as asp aspire aster astir at ate e +ar ears east eat eats em emir emirs emit emits ems era eras erst esprit eta et +as imp impart imparts imps irate ire is ism it item items its ma map maps mar + mare mares mars mart marts mas maser mast master mat mate mates mats me mea +t meats merit merits mesa met mi mire mires miser mist mister miter miters mit +es mitre mitres pa pair pairs par pare pares pars parse parties parts pas past +paste pastier pastime pat pate pates pats pea pear pears peas peat peats per + perm permit permits perms pert pest pet pets pi piaster piastre pie pier pi +ers pies pirate pirates pis pit pita pitas pits praise pram prams prate prates +pries priest prim primate primates prime primes prise prism raise ram ramie +ramies ramp ramps rams rap rape rapes rapist raps rapt rasp rate rates rats r +e ream reams reap reaps rem remaps remit remits rems rep repast reps res rest + rim rime rimes rims rip ripe ripest rips rise rite rites same sap sari sat + sate satire sea seam sear seat semi sepia septa sera set simper sip sir sir +e sit sitar site smart smear smit smite spa spar spare spat spate spear sper +m spire spirea spit spite sprat sprite stair stamp stamper star stare steam st +em step stir strap stream strep stria striae strip stripe tam tame tamer tamer +s tames tamp tamper tampers tamps tams tape taper tapers tapes tapir tapirs ta +ps tar tare tares tarp tarps tars tarsi tea team teams tear tears teas temp t +empi temps term terms ti tie tier tiers ties time timer timers times tip ti +ps tire tires tis traipse tram tramp tramps trams trap traps trim trims trip t +ripe trips tsar

    Aaron B.
    Available for small or large Perl jobs; see my home node.

      aaron_baugher,
      I considered a bucket approach as well - but not quite the way you did it. What I was going to do was bucket all words of the same length and then at the end roll up 2 letter words into 3 letter words, 3 letter words into 4 letter words, etc. I realized this won't guarantee the best solution because of what I have explained elsewhere in this thread (taking some from one group and some from another can lead to the best solution)

      I ran your code against 2of12inf.txt (actually, the file I had already reduced). On my machine it took 2 hours and 3 minutes. Here is the output:

      Working with 40933 words Letters: primates Number of words: 316
      The correct answer has 346 words.

      Cheers - L~R

        Yes, I had it dump the complete array to a file the last time, and the winning set "painters" only gets 292 words from my script. I had a feeling it could miss some somehow, but I can't quite figure out how yet.

        Update: I see the problem now. Let's say "painters" is the 100th word, and "at" is word #2 and "not" is word #3. "not" gets added to bucket #2 along with "at" because it can fit, but now "painters" can't go into that bucket. And since "at" has already been placed in bucket #2 and possibly bucket #1, it can't be placed in bucket #100 with "painters" or in any of the buckets in between that "painters" might be in.

        So to make my approach work, I guess once all the words have been placed in at least one bucket, I would have to go back through them again, retrying them against each bucket past the one they started in. And even then I'm not sure you'd be guaranteed to get a bucket with the best possible combination.

        I had a nagging feeling there was a problem like that, but I needed a night's sleep to see it.

        Aaron B.
        Available for small or large Perl jobs; see my home node.

Re: Challenge: 8 Letters, Most Words
by hdb (Parson) on Oct 05, 2013 at 05:09 UTC

    Another brute force method that puts all words in 2of12inf.txt in 8 letter classes and counts their members. Runs 40mins on my machine.

    use strict; use warnings; use Data::Dumper; sub fits { my( $class, $subclass ) = @_; return 0 if length( $class ) < length( $subclass ); $class =~ s/$_// or return 0 for split //, $subclass; return 1; } open my $dict, "<", "2of12inf.txt" or die "Cannot open dictionary!\n"; my $words = do { local $/; <$dict> }; close $dict; my @words = sort { length($b) <=> length($a) } sort $words =~ /\b(\w{1 +,8})\b/g; my %classes; my $cnt = 0; for my $w (@words) { my $ws = join '', sort split //, $w; my $fitted = 0; for my $class (keys %classes) { if( fits( $class, $ws ) ) { $classes{$class}{$w}++; $fitted++; } } $classes{$ws}{$w}++ unless $fitted; } my @sorted = sort { scalar keys %{$classes{$b}} <=> scalar keys %{$cla +sses{$a}} } keys %classes; for( @sorted ) { print "$_: ", scalar keys %{$classes{$_}}, "\n"; }

    and here are my top ten:

    aeinprst: 346 aeilprst: 344 adeiprst: 332 aeimprst: 328 adeilrst: 319 adeoprst: 316 aeilnpst: 314 adeinrst: 313 aeloprst: 313 aeilnrst: 312

      Cleaned up the code a bit:

      use strict; use warnings; sub fits { my( $class, $subclass ) = @_; $class =~ s/$_// or return 0 for split //, $subclass; return 1; } open my $dict, "<", "2of12inf.txt" or die "Cannot open dictionary!\n"; my $words = do { local $/; <$dict> }; close $dict; my @words = sort { length($b) <=> length($a) } sort $words =~ /\b(\w{1 +,8})\b/g; my %classes; for my $w (@words) { my $fitted = 0; fits( $_, $w ) and $fitted = $classes{$_}{$w} = 1 for keys %classes; $classes{join '', sort split //, $w}{$w} = 1 unless $fitted; } my @sorted = sort { scalar keys %{$classes{$b}} <=> scalar keys %{$cla +sses{$a}} } keys %classes; print "$_: ", scalar keys %{$classes{$_}}, "\n" for @sorted;

      So this is how it works:

      1. Reads the dictionary and sorts the words descending according to length, ie the 8-letter words first, then the 7-letter words etc. It also sorts alphabetically but that is not required.
      2. It then goes through the list of words and builds classes or sets of words indexed by their common letters. More specifically, it checks all classes created so far, whether there is one having all the letters needed for the next word.
      3. If yes, it adds it to all existing classes where it fits. It is important to add them not to one class but to all fitting classes.
      4. If not, it creates a new one.
      5. For this, it is important that long words are processed before short words.
      6. Whether it fits or not, is done by removing each letter of the new word from the letters of the class. If this is possible for all letters of the new word, then it fits and will be added to the class.
      7. After processing all words, it sorts the classes by number of members descending and prints them in that order.
      It did run in 34 minutes. I'll be grateful for any hints on performance optimization.

      Update: As I realized when reading Re^5: Challenge: 8 Letters, Most Words this solution will not be able to put two four letter words into one class irrespective of the letters. So in some situations it will not find the best solution.

Re: Challenge: 8 Letters, Most Words
by Tux (Monsignor) on Oct 05, 2013 at 08:51 UTC

    I get 670 with aeilnrst using /usr/share/dict/words and 346 with aeinprst using 2of12inf, but I'd like to expand on this. All presented solutions take as a base that the letters should be able to create at least one 8-letter word. If I however assume that a 7-letter word with a random extra letter is able to create more words with shorter words, the number of possibilities explodes.

    And here is my code:


    Enjoy, Have FUN! H.Merijn

      "All presented solutions take as a base that the letters should be able to create at least one 8-letter word."

      Mine doesn't.

      use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name
Re: Challenge: 8 Letters, Most Words
by kcott (Abbot) on Oct 05, 2013 at 14:47 UTC

    G'day Limbic~Region,

    Here's my take on a solution. It uses the 2of12inf.txt dictionary. It takes about a minute to run on my OS. [as always, YMMV] It doesn't use any particularly noticeable amounts of memory.

    #!/usr/bin/env perl use 5.010; use strict; use warnings; use autodie; die "Usage: $0 dictionary max_letters min_letters\n" unless @ARGV >= 2 +; my ($DICT, $MAX, $MIN) = @ARGV; $MIN //= 1; my %dict_extract; open my $fh, '<', $DICT; while (<$fh>) { s/%?\r?\n?$//; next if length > $MAX; ++$dict_extract{+length}{join '' => sort split ''}; } close $fh; my $best_letters = ''; my $most_words = 0; for my $letters (keys %{$dict_extract{$MAX}}) { my $count = get_count($letters, {}); if ($most_words < $count) { $most_words = $count; $best_letters = $letters; } } say "Best: $best_letters - Count: $most_words"; sub get_count { my ($letters, $counted) = @_; return 0 if $counted->{$letters}++; my $len = length $letters; my $count = $dict_extract{$len}{$letters} // 0; if ($len > $MIN) { $count += get_count($_, $counted) for map { my @sub_string = split '', $letters; splice @sub_string, $_, 1; join '' => @sub_string; } 0 .. $len - 1; } return $count; }

    Output:

    Best: aeinprst - Count: 346

    I also wrote a complementary script to check the results (this only takes a second or two to run):

    #!/usr/bin/env perl use 5.010; use strict; use warnings; use autodie; die "Usage: $0 dictionary solution max_letters\n" unless @ARGV == 3; my ($DICT, $SOLUTION, $MAX) = @ARGV; my $count = 0; open my $fh, '<', $DICT; while (<$fh>) { s/%?\r?\n?$//; next if length > $MAX; if (matching_word($_)) { ++$count; say; } } close $fh; say "Total: $count"; sub matching_word { state $solution_letters = [ split '' => $SOLUTION ]; my ($word) = @_; foreach my $letter (@$solution_letters) { my $pos = index $word, $letter; if ($pos != -1) { substr($word, $pos, 1) = ''; return 1 if length $word == 0; } } return 0; }

    Output:

    air airs an ... tripes trips tsar Total: 346

    -- Ken

Re: Challenge: 8 Letters, Most Words
by davido (Archbishop) on Oct 07, 2013 at 20:41 UTC

    I believe that this will always produce a correct result. So here is a solution that gets it right in under 1 minute 40 seconds, using SQLite:

    Update: Here's a newer version that has better reporting:

    The result I get with $ time ./eightletters.pl is:

    ....... (some output deleted for brevity) ...... Totals ------ Processed 40933 words. (pntirase) spells 346 words. real 1m38.981s user 1m37.770s sys 0m0.116s

    The logic works as follows:

    • Grab all words of eight characters or less from the dictionary, as that's all we care about.
    • Determine the letter frequency from the trimmed dictionary.
    • Split the list of words into those with eight characters, and those with fewer.
    • Create buckets for each unique combination of letters that represents eight-letter words. In cases where the same combination repeats, just increment the counter for that bucket.
    • Insert all of the buckets into a database. Each row is a bucket. Each column is a count of how many times a letter is used in that bucket. The final column is a counter for how many times the bucket is used.
    • Now run through all the words of length less than eight. Select from the database every bucket that a given word can fit into, and increment the counters for each of those buckets.
    • Find the maximum count; ie, the counter for the bucket where the most words fit.
    • Find the row this occurs in, and grab the letters that the bucket represents.

    In the database, the letter columns are in reverse-frequency order, or frequency-ascending order. That way, 'q' is the first letter column, and so on, finishing at 'e'. The queries also use this reverse-frequency order, so that the "AND"s within the "WHERE" clause can narrow down the list of records to include as early as possible. ...at least that's my theory. ;)

    The code is embarrassingly dirty, and there are several significant opportunities to further trim cycles. But I wanted to put the ideas out there in case someone else wanted to consider the methodology.


    Dave

Re: Challenge: 8 Letters, Most Words
by oiskuu (Pilgrim) on Oct 08, 2013 at 15:28 UTC

    Nice challenge. I went for a particular brute force approach that requires 64bit perl with threads.
    Runs for ~140 mins on my quad-core. Top results:

    aeinprst: 346 aeilprst: 344 adeiprst: 332 aeimprst: 328

    Filtering 2of12inf.txt gives 40933 words, 35893 of which incongruent.
    Exhaustive search must cover 13884156 letter combinations, like e.g.:

    use Algorithm::Combinatorics qw(:all); my $iter = combinations_with_repetition(\@{['a'..'z']}, 8);

    Anyway, this is what I wrote:

      Ok, I tried to filter the word set before going into loops. This appears to reduce the search space by a factor of 30.
      Run time is down to ~5 minutes.

      Word counts for 2of12inf.txt, along with word "power": ============================ WORDS COMBINATIONS len:number k:number ---------------------------- 2: 62 6: 736281 3: 642 5: 142506 4: 2546 4: 23751 5: 5122 3: 3276 6: 8303 2: 351 7: 11571 1: 26 8: 12687 0: 1 ============================ Expected hits: 12687*1 + 11571*26 + 8303*351 + 5122*3276 + 2546*23751 ++ 642*142506 + 62*736281 = 217615878 Counted hits (with filtering): = 217615878

      Update2: added coverage check

Re: Challenge: 8 Letters, Most Words (2of12int.txt in 20^H^H 10 secs; Pure Perl)
by BrowserUk (Pope) on Oct 09, 2013 at 06:25 UTC

    Exhaustive search should work for any dictionary. (The 10 secs doesn't include the sort, but it could use a high watermark.):

    #! perl -slw use strict; use Time::HiRes qw[ time ]; sub uniq{ my %x; @x{@_} = (); keys %x } my $start = time; my @lookup = map{ my $bits = pack 'C', $_; [ grep vec( $bits, $_, 1 ), 0 .. 7 ] } 0 .. 255; my %dict; while( <> ) { chomp; next if length > 8; my $r = \%dict; $r = $r->{ $_ } //= {} for sort split '', $_; push @{ $r->{_} }, $_; } my %stats; sub X { my( $first, $soFar, $tree ) = @_; if( @$soFar == 8 ) { my @words = uniq map { my $r = \%dict; $r = $r->{ $_ } for @{ $soFar }[ @{ $lookup[ $_ ] } ]; exists $r->{_} ? @{ $r->{_} } : (); } 0 .. 255; $stats{ join '', @$soFar } = \@words; return; } for( $first .. 'z' ) { next unless exists $tree->{ $_ }; X( $_, [ @$soFar, $_ ], $tree->{ $_ } ); } return; } X( 'a', [], \%dict ); print time - $start; <STDIN>; printf "[%d] %s : @{[ @{$stats{ $_ }} ]}\n", scalar @{$stats{$_}}, $_ +for sort{ @{ $stats{ $b } } <=> @{ $stats{ $a } } } keys %stats; __END__ [ 7:09:30.87] C:\test>1056884-calc 2of12inf.txt 10.4807748794556 [346] aeinprst : nape tars pets ape panties taper inept nastier print +airs spinet sniper pane rants pint peat spear terns tears trains reps earn tines paints napes ate strep rest repaint ti inapt ri +nse ripest astir priest neat part nets pans retina apes tarps reaps nip rain tern res sip sprint instep sear inters in prints +ires erst parent snip re astern satire praise stein pirates pirate pertains tan tar pie par tis nite ant apter pa ants rites ranis + rite eta as pastier rents nit ties pastern rate sapient strip pants pits apt pent snipe niters ran pate tsar stain pints panie +r spirea rapines tapers repast irate pars span pairs tens raps ripen stair tarn tans retsina tin rips pies inert eats reins ear +esprit aster rapes pin tarsi sera neaps rani step rein resin inter piastre stria painters sprain neap tripes tires raise tier en st +ern parents pas ares sari pert satin sine nitres ripens tape prates striae retain arts pet ani sat tare pertain insert reap pit pra +te pea trap antsier pantie sterna tries rasp east pen spit sit arise naps tire tea pares pat snare stir spar pine rap tares is pa +n rant earns pant art it painter sate pear tips pens rep pitas tarp penis aspen antis teas nips tap aspire piaster traps past pier er +a ears asp sane star seat entrap rip snit siren sin nae pins spare tapes stare niter spate sitar rent prise taps anti trip trips ti +ns spire ens tarns spent traipse antes tripe rise tapir tip pest peas nest sent its sir pries retains anise rat pates pare rats tr +ain saner pita eat nitre spat parse tapirs tiers ten risen nears sap spite set stripe sprite paniers sire strap tie sea septa pis + site psi eras repaints an pain paste ripe ire net entraps nits pair apse pains rates rapt per saint near paint rains spin strain + etas snap pines rapist rape are rapine sepia pantries spine peats tear at parties pi ante sprat retinas inset spa arisen ins parts + nites tine piers panes pats nap pears air [344] aeilprst : liars tars platies pets ape taper lairs airs lire lit +er ails peat spear plies palest tears lets lest reps ate strep rest lips leis slate ti ripest astir lie priest seal part apes t +arps piles reaps trial plate res sip paler litres sear retail ires erst rails salt plates re satire praise pirates pirate lea + liras tar pie retails pleas par pleats tis apter pa realist pelt rites peril alters rite relit stale eta earls as litre pa +stier lies trails ties rate strip pits apt petal pate list tsar spirea tapers petals repast irate pars plea pairs pleat stap +ler raps stair saltier lit pales tali alert rips spilt pies eats laser ear peal slit esprit aster rapes tarsi sera ales slip +alit spelt step alter piastre silt stria tripes tires raise liar tier leapt tiler teals pas ares sari sale pert tape prates +pal striae riles sail alp plats arts pet plait sat tare reap pit late pails steal prate least pea tiles trap tries slier rasp +sepal pastel east spit sit arise alps tire tea pares pat rial stir spar laps later rap tares is la rials ilea leas staler alert +s art it last sate ale aisle pear tips leaps rep pitas tarp trail lept tile stile teas tap aspire piaster teal traps past tai +l pier split era ears asp star seat pile rip spare rile tapes stare spate sitar pelts triples las prise islet lisper lepta tap +s plaster staple trip pail spiel lair pilaster trips slat pearl spire lip traipse tripe rise tapir lei pale tip slept pest peas +tilers its sir lite pries rat pearls serial rail pates pare rats lapse tails pita eat slap spat pals parse plaits tapirs lisp + tiers lap pliers salter tales sap spite set real ail stripe sprite sire strap splat tie sea septa pis site psi eras plat pa +ste ripe ire let pair apse leap rates rapt per triple etas perils rapist rape liters lira are trials spiral sepia peats tear + at parties pi earl peals sprat spa isle parts tale piers pats pears air ...

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      It is quite impressive. I have not yet understood how it works. It seems not to work on the following dictionary:

      exactest one two three four five six seven eight nine ten

      where one solution is eehnortw covering four words but your script says [1] aceesttx : exactest.

      Update: After further study it looks to me, that you are checking all 8 letter classes derived from the 8 letter words in the dictionary and see how many words are captured. Which is probably giving you the correct solution for many real world dictionaries. This way, you do avoid the combinatorial explosion that makes this challenge difficult...

      Still quite impressive!

        It does recurse through all 8-letter combos, but short-circuits at each level if the combination so far cannot be derived from the dictionary supplied.

        next unless exists $tree->{ $_ };

        If the letter combination for your 'eehnortw' existed in the dictionary, it would be found very quickly:

        [17:25:23.81] C:\test>type hdb.dict exactest one two three four five six seven eight nine ten torewhen [17:25:47.91] C:\test>1056884-calc hdb.dict 0.00380706787109375 [5] eehnortw : torewhen three one two ten [1] aceesttx : exactest

        Comment out the short-circuit and it will consider all 8-letter possibilities .. and run more slowly.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        There are 25 letter sets that will cover 4 words of your sample dictionary:

        [19:35:10.42] C:\test>1056884-calc hdb.dict 2aaaaenot 3aaaenotw 4aeinnotw 373.12408208847 [4] einnootw : one two nine ten [4] efinnotv : five one nine ten [4] einnotvw : one two nine ten [4] eenostvw : one two seven ten [4] einnotuw : one two nine ten [4] einnostw : one two nine ten [4] eghinnot : one eight nine ten [4] ceinnotw : one two nine ten [4] eehnortw : three one two ten [4] efinnotw : one two nine ten [4] efinotvw : five one two ten [4] eiinnotw : one two nine ten [4] eeinnotw : one two nine ten [4] eghinotw : one two eight ten [4] ehinnotw : one two nine ten [4] einnortw : one two nine ten [4] einnotwx : one two nine ten [4] einostwx : six one two ten [4] einnnotw : one two nine ten [4] eginnotw : one two nine ten [4] efnortuw : one two four ten [4] einnotww : one two nine ten [4] aeinnotw : one two nine ten [4] einnottw : one two nine ten [4] einnostx : six one nine ten [3] eenostwx : one two ten ...

        375 seconds isn't as impressive, but better than 24 hours :)

        This makes use of another optimisation that only benefits when the dictionary is small like yours:

        #! perl -slw use strict; use Time::HiRes qw[ time ]; $|++; sub uniq{ my %x; @x{@_} = (); keys %x } my $start = time; my @lookup = map{ my $bits = pack 'C', $_; [ grep vec( $bits, $_, 1 ), 0 .. 7 ] } 0 .. 255; my %dict; my %alphabet; while( <> ) { chomp; next if length > 8; my $r = \%dict; undef( $alphabet{ $_ } ), $r = $r->{ $_ } //= {} for sort split '' +, $_; push @{ $r->{_} }, $_; } my @alphabet = sort keys %alphabet; my $best = [ 0, '' ]; my %stats; sub X { my( $first, $soFar, $tree ) = @_; if( @$soFar == 8 ) { my @words = uniq map { my $r = \%dict; $r = $r->{ $_ } for @{ $soFar }[ @{ $lookup[ $_ ] } ]; exists $r->{_} ? @{ $r->{_} } : (); } 0 .. 255; return unless @words > 1; print @{ $best = [ scalar @words, join '', @$soFar ] } if @wor +ds > $best->[0]; $stats{ join '', @$soFar } = \@words; return; } for( grep $_ ge $first, @alphabet ) { # next unless exists $tree->{ $_ }; X( $_, [ @$soFar, $_ ], $tree->{ $_ } ); } return; } X( 'a', [], \%dict ); print time - $start; <STDIN>; printf "[%d] %s : @{[ @{$stats{ $_ }} ]}\n", scalar @{$stats{$_}}, $_ +for sort{ @{ $stats{ $b } } <=> @{ $stats{ $a } } } keys %stats;

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        @$soFar, $_
Re: Challenge: 8 Letters, Most Words
by duelafn (Priest) on Oct 11, 2013 at 12:40 UTC

    Simple greedy algorithm takes only a 20-45 seconds to find some of the top contenders. Does better on my large wordlist than the smaller 2of12inf.txt. Didn't bother to extend to examine more branches or perform exhaustive search on last 2 or 3 letters since it did so well.

    $ wc -l /usr/share/dict/words 234937 /usr/share/dict/words $ ./1056884.pl 548: a e i l n r s t $ wc -l /tmp/2of12inf.txt 81536 /tmp/2of12inf.txt $ ./1056884.pl /tmp/2of12inf.txt 337: a e i n p r s t

    The Code:

    Good Day,
        Dean

Re: Challenge: 8 Letters, Most Words
by McA (Deacon) on Oct 13, 2013 at 19:03 UTC

    Hi L~R, hi all other interested monks.

    It's done. My program presented at Re^10: Challenge: 8 Letters, Most Words ended. My estimation was not too bad, as you can see:

    real 11330m25.517s user 11319m0.142s sys 1m29.073s

    This is worth of 7 days, 20 hours, 50 minutes, 25.517 seconds.

    IMHO THIS is a really (stupid) brute force solution. ;-)

    So, now to the result: I found two winners:

    aeinprst aeilprst

    With both the program was able to build 337 words out of the dictionary.

    Here you can find the dictionary I worked on and the detailed result: http://pastebin.com/jU5cbzb8

    Probably I will investigate the differencet to the solution of L~R. I'm sure it would be faster if he would run his script/program on my dictionary. :-)

    Best regards
    McA

    P.S.: If anybody is saying to you the well known mantra: Code first, optimize later! show him this node and answer: Time, resources and runtime are requirements, even when secondary ones. ;-)

Re: Challenge: 8 Letters, Most Words
by repellent (Priest) on Oct 31, 2013 at 18:06 UTC
    Haven't been on Monastery grounds in a while, so I'm a little late to this party.
    $ time t.pl 2of12inf.txt Done processing dict (40933 words). Candidate counts, by number of let +ters: $VAR1 = [ 0, 53, 516, 1894, 4068, 7076, 10360, 11926 ]; Done computation. Result (most words at bottom): aabdellu : 1 emprrtuy : 1 [... output truncated ...] aeginrst : 296 aelmprst : 297 acelprst : 297 aceiprst : 303 adeimrst : 305 adeoprst : 307 aeimnrst : 307 aeilnpst : 308 adeinrst : 311 aeilnrst : 311 adeilrst : 319 aeimprst : 327 adeiprst : 331 aeinprst : 336 aeilprst : 343 real 0m4.860s user 0m3.665s sys 0m0.160s

    My solution:

    My strategy is to aggregate up the word counts from candidates with the fewest letters to the most letters. (This approach is not thorough but is fast, see Update #2). The trick to make it fast is to realize that there is a one letter difference between candidate tiers, and to cycle through 26 letters for subset matching.

    My winner is aeilprst with it being able to make at least 343 words (using all 8 letters, or a subset of that). This may not be the best answer because of the simplistic aggregation, but it points to the likely champs.

    Looking at the other replies makes me question myself. What am I doing wrong?

    Other results have winners that create only hundreds of words. Seems strange, given that 40k+ words were processed. Is that for words using all 8 letters only (and not a subset of the letters)?

    Update: I see what's wrong now. The aggregation has to keep track of contributing counts of specific candidates. Back to drawing board.

    Update #2: I have updated the script and results, which is more in line with what others have gotten. The total word counts do come up short because of the aggregation strategy. But the tradeoff in speed is significant.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1056884]
Approved by MidLifeXis
Front-paged by MidLifeXis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (8)
As of 2014-07-23 00:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (130 votes), past polls