Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Challenge: 8 Letters, Most Words

by CountZero (Bishop)
on Oct 04, 2013 at 20:06 UTC ( #1056926=note: print w/ replies, xml ) Need Help??


in reply to Challenge: 8 Letters, Most Words

A brute force approach, using the 2of12inf.txt dictionary.

use Modern::Perl; open my $DICT, '<', './2of12inf.txt' or die "Could not open dictionary +: $!"; my @dictionary; my %words8; my %results; while (<$DICT>) { chomp; next unless $_; chop if /%$/; next if length > 8; my $sorted = join '', sort split ''; push @dictionary, $sorted; $words8{$sorted}++ if length == 8; } my $maxword = 0; my $solution = ''; for my $word8 ( keys %words8 ) { my $regex = join '?', split '', $word8; for my $word (@dictionary) { $results{$word8}++ if $word =~ /($regex?)/ and $1 eq $word; } if ( $results{$word8} > $maxword ) { $maxword = $results{$word8}; $solution = $word8; say "$solution can make $maxword words"; } } say "Finished";
Output:
aabdellu can make 55 words abellrsu can make 118 words aeikprrs can make 131 words ahopstuw can make 136 words ... acehprst can make 288 words adeinrst can make 313 words adeoprst can make 316 words aeilprst can make 344 words aeinprst can make 346 words Finished
Corrected the "off-by-one" error.

Re-corrected the off-by-one error which was due to an empty line in the small test dictionaries I used.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics


Comment on Re: Challenge: 8 Letters, Most Words
Select or Download Code
Re^2: Challenge: 8 Letters, Most Words
by McA (Deacon) on Oct 04, 2013 at 20:34 UTC

    Hi CountZero,

    I'm pretty sure you have the wrong driver. Look at this dictionary:

    ffffffff fffffffa afffffff bfffffff fffffffb ffffbfff a aa aaa aaaa aaaaa aaaaaa

    The solution is clearly something with 6 x 'a'. Your program states something different.

    Best regards
    McA

      You are right, because I check against all 8 character words and 'aaaaaa' only has six and therefore it is not considered. To be absolutely certain one would have to check against all combinations of 8 characters, i.e. checking your whole dictionary against each of 208 827 064 576 combinations.

      So my solution answers the question with one extra condition: that the 8 characters together form a word too.

      And I seem to have an "off by one error" somewhere.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      My blog: Imperial Deltronics

        A ++ for admitting that there is missing something. That shows IMO real greatness.

        Best regards
        McA

        CountZero,
        I am going to write a brute force solution this weekend but your math isn't quite right. First, you don't need all 26^8th since order doesn't matter. The upper bound is less than 5.2 million. We can reduce it further if we consider the maximum number of times a given letter can repeat. We do not have to consider a worst-case scenario dictionary so I think I can get that 5.2 million down to around 3 million. We still need to compare each of those 3 million against tens of thousands of words so I will be using C to do it but I think run time will be much less than 24 hours.

        Update: My math was off somewhere. The upper bound was 13_884_156 and I was only able to reduce it down to 12_461_993. Since my word list contains 40,933 words (filtered down from 81_536), that means I am up to 510_106_759_469 checks. In order to finish in under 24 hours, I will need to be able to do 5.9 million checks per second. I hope my C skills are up for the challenge (stay tuned).

        Cheers - L~R

        I am getting somehow similar results. The difference is, I guess, that I initialize the count with the number of occurrences of the combination before adding the shorter words. The script runs for 3min26s, the output ends with
        adeimrst: 310 aeilnrst: 312 adeinrst: 313 aeloprst: 313 aeilnpst: 314 adeoprst: 316 adeilrst: 319 aeimprst: 328 adeiprst: 332 aeilprst: 344 aeinprst: 346

        The script:

        Update: Checking the result:

        $ grep -E '^[aeinprst]{1,8}$' 2of12inf.txt | grep -vE '(.).*\1' | wc +-l 346

        Update 2: Testing duplicate characters:

        $ grep -E '(.).*\1.*:' 1056884.out | tail -n1 aeiprsst: 279 $ grep -E '^[aeiprsst]{1,8}$' 2of12inf.txt | grep -Ev '([aeiprt]).*\1| +s.*s.*s' | wc -l 279
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re^2: Challenge: 8 Letters, Most Words
by Limbic~Region (Chancellor) on Oct 05, 2013 at 03:16 UTC
    CountZero,
    What was the "off-by-one" error? Using the same dictionary file, I get aeinprst = 346 which I have manually verified. This isn't brute force (as you have indicated elsewhere) - it is heuristic but appears to come up with the correct result. While I like it, I needed to know for certain which is why I wrote an exhaustive solution which amazingly finished in 12 minutes.

    Cheers - L~R

      Well, the off-by-one error I assumed was in my code was actually in the small test dictionaries I ran, because of a spurious empty line at the end.

      It is quite funny that my brute force code finds the right solution even without going through all possible combinations. I guess I was just lucky.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      My blog: Imperial Deltronics

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1056926]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2014-07-14 02:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (254 votes), past polls