Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^5: Challenge: 8 Letters, Most Words

by CountZero (Bishop)
on Oct 05, 2013 at 08:27 UTC ( [id://1057014]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Challenge: 8 Letters, Most Words
in thread Challenge: 8 Letters, Most Words

I did some more analysing the 2of12inf.txt dictionary and for words of maximum 8 characters, this is the maximum number of the same characters that can exist in any one word:
a => 4, b => 3, c => 3, d => 4, e => 4, f => 4, g => 4, h => 3, i => 3, j => 2, k => 3, l => 4, m => 3, n => 4, o => 4, p => 3, q => 1, r => 4, s => 5, t => 3, u => 4, v => 2, w => 3, x => 2, y => 3, z => 4,
In other words, choose all combinations of any 8 chars out of the following string: "aaaabbbcccddddeeeeffffgggghhhiiijjkkkllllmmmnnnnoooopppqrrrrssssstttuuuuvvwwwxxyyyzzzz" and run your dictionary against each of these and your search is exhaustive.

Of course that is still about 2.14 E15 combinations to check, but many of these will be the same. So, perhaps someone with more mathematical skills than I can write an efficient generator for these combinations that does not have to go through them all to weed out the duplicates.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics

Replies are listed 'Best First'.
Re^6: Challenge: 8 Letters, Most Words
by LanX (Saint) on Oct 05, 2013 at 13:19 UTC
    > In other words, choose all combinations of any 8 chars out of the following string: "aaaabbbcccddddeeeeffffgggghhhiiijjkkkllllmmmnnnnoooopppqrrrrssssstttuuuuvvwwwxxyyyzzzz"

    The point is to handle duplications, if all letters are unique the answer is simply (26 over 8)

    DB<288> sub fac {my $x=1; $x*=$_ for 2..$_[0]; $x} DB<289> sub binom { my ($n,$k)=@_; fac($n)/(fac($n-$k)*fac($k)) } DB<290> binom 26,8 => 1562275

    Reasoning is simple, it calculates all binary vectors of length 26 with exactly 8 1-bits.

    But with duplications its more complicated, e.g. 4 out of "aabcd" is not (5 over 4)=5

    a bcd abcd aa cd aab d aabc

    cause the first 2 solutions are identical.

    Generating all combinations and filtering the unique once is normally not very clever, cause the overhead can be enormous.

    DB<292> length "aaaabbbcccddddeeeeffffgggghhhiiijjkkkllllmmmnnnnoooo +pppqrrrrssssstttuuuuvvwwwxxyyyzzzz" => 86 DB<293> binom 86,8 => "53060358690"

    And I'm stuck finding a formula which calculates all unique solutions, but generating is easier, just don't allow bitvectors with "gaps" between identical letters:

    so

    "aaaabbbc..." # pattern "1000100...." # ok => ab... "1100100...." # ok => aab... "1001100.... # not ok => aab...

    I will update this post with a loop generating all possibilities soon.

    update

    indeed L~R's number of 12461993 possibilities is correct

    The following (non-optimized) code took 3 minutes to calculate them:

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      LanX,
      And I'm stuck finding a formula which calculates all unique solutions

      If a letter was allowed to repeat up to the maximum 8 times, the formula is:

      (n + (k - 1))! -------------- k! (n - 1)! 33! -------- = 13_884_156 8! * 25!

      I am not sure how to get from 13_884_156 to 12_461_993 strictly through calculation but it seems like it should be possible.

      Cheers - L~R

        Hei L~R,

        I'm sure there is a formula, but I doubt it's a simple formula. (rather something with 26 subterms)

        I could dig into it for some time, but firing our script which just counts all combinations seems fine for me! :)

        So you are still obsessed with this problem? :)

        Cheers Rolf

        ( addicted to the Perl Programming Language)

      LanX,
      The C code I wrote to calculate it is below. It finished nearly instantly. I was trying to determine how to write the rest of the code (could it fit in memory). It also ended up being part of my final solution since I realized it was unnecessary to keep them in memory using a high water mark algorithm.

      Cheers - L~R

        isn't the goal to find a clever Perl solution for the whole problem which does it within less than an hour? :)

        I have two good ideas but no time to hack them. :(

        Cheers Rolf

        ( addicted to the Perl Programming Language)

Re^6: Challenge: 8 Letters, Most Words
by Limbic~Region (Chancellor) on Oct 05, 2013 at 12:57 UTC
    CountZero,
    I did some more analysing the 2of12inf.txt dictionary and for words of maximum 8 characters, this is the maximum number of the same characters that can exist in any one word:

    I had already done that which is where my math was coming from.

    Of course that is still about 2.14 E15 combinations to check, but many of these will be the same. So, perhaps someone with more mathematical skills than I can write an efficient generator for these combinations that does not have to go through them all to weed out the duplicates.

    Have you looked at my code? I already indicated that doing this reduces the number of 8 letter words that needs to be checked down to 12,461,993. The reason it grows back up to 510,106,759,469 loops/checks is because you still have to check each one of the 12.5 million against the 40K actual words. I was able to do this using C in 12 minutes.

    Cheers - L~R

      Your C-skills are legendary.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      My blog: Imperial Deltronics

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1057014]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2024-04-18 13:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found