Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Challenge: 8 Letters, Most Words

by Limbic~Region (Chancellor)
on Oct 04, 2013 at 20:55 UTC ( #1056934=note: print w/ replies, xml ) Need Help??


in reply to Re: Challenge: 8 Letters, Most Words
in thread Challenge: 8 Letters, Most Words

TJPride,
You filtered things out that shouldn't have been (capitalization/punctuation don't matter for instance). Here is the code that I am using to filter my list:

#!/usr/bin/perl use strict; use warnings; my %seen; open(my $fh, '<', '2of12inf.txt') or die "Unable to open '2of12inf.txt +' for reading: $!\n"; while (<$fh>) { chomp; $_ = lc($_); tr/a-z//cd; next if ! $_ || length($_) > 8 || $seen{$_}++; print "$_\n"; }

This should be 100% accurate with any given word list

I will be using the same dictionary with my brute force solution so I will let you know though I won't be filtering out all the words you have.

Cheers - L~R


Comment on Re^2: Challenge: 8 Letters, Most Words
Download Code
Re^3: Challenge: 8 Letters, Most Words
by CountZero (Bishop) on Oct 04, 2013 at 21:08 UTC
    The '2of12inf.txt' already has all these filtered out, but some words have a '%' added (I do not know why, but you will have to delete the '%').

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re^3: Challenge: 8 Letters, Most Words
by TJPride (Pilgrim) on Oct 04, 2013 at 22:09 UTC
    Capitalization may not matter, but dupes do, and there are often multiple versions of the same word. I suppose I could have just lowercased everything and then deduped. The word list is largely irrelevant, though - the important thing is the algorithm.

    What do you mean by brute force, exactly? There are far too many possible letter combinations to just check each one.

      TJPride,
      What do you mean by brute force, exactly? There are far too many possible letter combinations to just check each one.

      As I pointed out elsewhere in this thread, you don't need to check all 26^8 possibilities since order of letters doesn't matter. That leaves 13_884_156 possible 8 letter combinations. Since I am using a specific dictionary and no word allows a single letter to repeat more than 5 times, I can further reduce that down to 12_461_993. Since my word list contains 40,933 words, that means I am up to 510_106_759_469 checks. In order to finish in under 24 hours, I will need to be able to do 5.9 million checks per second. I hope my C skills are up for the challenge (stay tuned).

      Cheers - L~R

        I am not sure whether my solution is correct, but if it is, it is much faster, because it only checks the possible combinations, not all of them.
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        > That leaves 13_884_156 possible 8 letter combinations. Since I am using a specific dictionary and no word allows a single letter to repeat more than 5 times,

        I think pre-investigating the dictionary should dramatically limit this number.

        Just find the real maximum repetition for each letter.

        Cheers Rolf

        ( addicted to the Perl Programming Language)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1056934]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (10)
As of 2014-11-26 03:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (162 votes), past polls