Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Wordfeud racks

by Anonymous Monk
on Jul 30, 2013 at 11:51 UTC ( #1046998=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

you all know of wordgames like Wordfeud and Scrabble. At the very beginning of such a game there is a bag full of tiles. In English Wordfeud for example there are 10 A:s, 2 B:s etc.

Now - how to list all possible racks in the initial stage? A rack consists of 7 tiles, and the order of the tiles at hand are of course of no importance.

And, additionally compute the probability of each rack.

Something like this
AAAAAAA 0.00x probability AAAAAAB 0.000x probability ... etc ...

To list the possible racks I have already looked at some CPAN modules, but most modules seems to handle only cases where there is only one element of each kind in the set from which to draw. The one I found that could handle this Math::Combinatorics was so ridicilously slow it would probably take days or even weeks to generate the list!

So now I turn to thee, monks, for spiritual Perl advice.
/L

Comment on Wordfeud racks
Download Code
Re: Wordfeud racks
by daxim (Chaplain) on Jul 30, 2013 at 11:58 UTC
    Use Algorithm::Combinatorics, that's written in XS.

    Edit: you can make your problem easier by mapping your multiset/bag into a set first, so that the simpler algorithms can work: A,A,A,A,A,A,A,A,A,A,B,B,… ⤖ A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,B1,B2,…

      Thanks for your suggestion, But I can't see how to make A::C list only unique racks.

      Example
      If my bag of tiles contained only 3 tiles A A B
      And I wanted to list all possible racks.
      I would end up with only 2 racks:

      AA
      AB

      A::C will give me all combinations of the "physical" tiles:

      A1A2
      A1B
      A2B

      Can you please elaborate a little on how to use the mapping to solve this problem?

      /L
        A::C will give me all combinations of the "physical" tiles
        But that's exactly what you need to calculate the probability!

        To be able to distinguish duplicates: unmap the tiles again, then sort and join each rack, finally List::MoreUtils::uniq.

      Interestingly, it does not run much faster than a simple algorithm I wrote from scratch. For the rack size of 5, both run around 5m30sec on my box.

      Update: The code is buggy. See Re^3: Wordfeud racks for a fix.

      #!/usr/bin/perl use warnings; use strict; use 5.010; # defined-or, say my @tiles = (('_') x 2, # 2 blank tiles (scoring 0 points) # 1 point ('E') x 12, qw(A I) x 9, ('O') x 8, qw(N R T) x 6, qw(L S U) x 4, # 2 points ('D') x 4, ('G') x 3, # 3 points qw(B C M P) x 2, # 4 points qw(F H V W Y) x 2, # 5 points ('K'), # 8 points qw(J X), # 10 points qw(Q Z)); my $rack_size = 5; my %hash; my @rack = 0 .. $rack_size - 1; while ($rack[-1] != @tiles) { my $string = join q(), map $tiles[$_], @rack; $hash{$string}++; my $first_to_move = 0; $first_to_move++ while $rack[$first_to_move] + 1 == ($rack[$first_ +to_move + 1] // -1); $rack[$first_to_move]++; for my $i (0 .. $first_to_move - 1) { $rack[$i] = $i; } } my $combinations = keys %hash; my $sum = 0; $sum += $_ for values %hash; print map "$_ => " . ($hash{$_} / $sum) . "\n", sort { $hash{$a} <=> $ +hash{$b} } keys %hash; print "Combinations: $combinations, Count: $sum.\n";

      With the module, the important part of the code simplifies to:

      use Algorithm::Combinatorics qw(combinations); my %hash; my $iterator = combinations(\@tiles, $rack_size); while (my $rack = $iterator->next) { $hash{ join q(), @$rack }++; }

      Update: I did some benchmarks. It seems the module would get faster than my code on racks greater than 5, but it will still run for days.

      Update 2: Fixed a bug in the code.
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        performance tip, you could pre-size the hash, so there isn't need to grow it
        I ran the code for rack size 7. It took 12 hours 6 mins 18 secs on a 4 core Xeon box with 8 GB memory, OS linux. I am not going to copy the whole output here, just a few first and last lines:
        MMVFWXQ => 6.24704795748769e-11 __BPMKQ => 6.24704795748769e-11 BCBMYHV => 6.24704795748769e-11 ... EEAIOND => 3.56231662727778e-05 EEAIONU => 3.56231662727778e-05 EEAIONL => 3.56231662727778e-05 Combinations: 8275164, Count: 16007560800.

        I probably have an error in the code :-)

        Update: I see it now. The error was my effort to "DRY". The @tiles array must be defined as follows to make it work:

        my @tiles = (('_') x 2, # 2 blank tiles (scoring 0 points) # 1 point ('E') x 12, ('A') x 9, ('I') x 9, ('O') x 8, ('N') x 6, ('R') x 6, ('T') x 6, ('L') x 4, ('S') x 4, ('U') x 4, # 2 points ('D') x 4, ('G') x 3, # 3 points ('B') x 2, ('C') x 2, ('M') x 2, ('P') x 2, # 4 points ('F') x 2, ('H') x 2, ('V') x 2, ('W') x 2, ('Y') x 2, # 5 points ('K'), # 8 points qw(J X), # 10 points qw(Q Z));

        Stay tuned, the next result is comming in 12 hours.

        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Wordfeud racks (Algorithm::Permute)
by Anonymous Monk on Jul 30, 2013 at 12:02 UTC
      No, that is about permutations, where order matters. What I want is all unique combinations...
      /L
Re: Wordfeud racks
by BillKSmith (Chaplain) on Jul 30, 2013 at 13:31 UTC
    Have you estimated how many racks are possible? How long do you expect it takes to compute them all? Your extrapalation to days or weeks may be very reasonable.
    Bill

      3365856, I think. Sorry, less than that.

Re: Wordfeud racks
by RichardK (Priest) on Jul 30, 2013 at 15:31 UTC

    How many tiles are in the bag?

    for example, if it's only 100 then you're going to have to create 80,678,106,432,000 unique combinations - which might take a while :)

        Sorry, I meant permutations, so there a N!/(N-K)! ways to select K tiles from a bag of N.

        Or for 7 of 100 tiles = 100*99*98*97*96*95*94 permutations.

        I think it will be easier to calculate the probabilities of the different combinations rather than count them, but it's still going to be a fair amount of work.

Re: Wordfeud racks
by sundialsvc4 (Abbot) on Jul 30, 2013 at 17:11 UTC

    “All possible racks,” in such a game, is an infeasible number of possibilities to actually enumerate ... it would take weeks at best, if not months or centuries.   But you don’t need to “list them, then count them,” in order to compute the probability of any given draw.

    If you first presume that “a unique draw” would be one in which the letters are always arranged in alphabetic order, so that AABA and ABAA are considered to be the same, then any particular draw could be equally expressed as a vector of 26 elements, such as (in this case) [3, 1, 0, 0 ...].   There are two rules to determine if a vector is valid:   (a) the total of all elements in the vector must equal the total size of the draw, and (b) the value of the element in any given position must not exceed the defined total number of pieces for that position (letter).

    It is practical to construct a computer program to map out all of these combinations, since it is possible to eliminate billions of choices by recognizing as soon as possible when either of these two constraints have been exceeded.   (As soon as you know that, cut-short the search.)   Notice that there is nothing at all “random” about the algorithm.   The probability of any given vector is simply 1/n, where n is the number of unique vectors that are found to exist.   A similar technique can be used to solve the Eight Queens problem.

    It could be that what you’re actually interested in is:   “given a valid vector v, how many arrangements of letters correspond to that vector?”   This is a calculation that involves factorials (n!).   The answer is [ F(v) ::= ( v[1]! * v[2]! * ... * v[n]! ) ] for any given vector v of size n, no matter which particular letters might be involved.   Yes, having perfected the algorithm to give you the list of all possible vectors, with the help of bigint, you could also solve that problem.   The denominator is [ SUM( f(v) ) for all v ], and the numerator is simply f(v).

      “All possible racks,” in such a game, is an infeasible number of possibilities to actually enumerate ... it would take weeks at best, if not months or centuries. But you don’t need to “list them, then count them,” in order to compute the probability of any given draw.

      I don't think so. Even assuming the worse-case scenario that the bag has at least 7 copies of each letter, the number of possible ordered racks is at most 26**7, or about 8 billion (we absolutely do not care whether the A that you get is A1, or A2, or A3..., it is just an A). This is admitedly a big number, but you certainly don't need weeks (let alone centuries) to list them all on a computer by the standards of today. And the actual number of racks is in fact much smaller for two reasons: some letters have only possibly two or three copies and that reduces quite drastically the number of actual possibilities, and many racks are actually equivalent (permutations of another one). I do not have enough information on the game to compute everything, but I would be surprised if the number of actually different racks would exceed a couple of millions. So at most a few seconds to enumerate them all on a modern computer.

Re: Wordfeud racks
by Anonymous Monk on Jul 31, 2013 at 00:06 UTC
    There are only 15 different types of racks to consider: 7 different letters (1111111), 6 different letters (211111), 5 different letters (31111 or 22111), etc, all the way down to a rack with all identical letters.

    Assuming the bag contains at least 7 tiles for each letter (and we have 26 different tiles, that is, no blanks), it's easy to calculate how many different racks you can make for each type. For instance, type 111111 give you 26*25*24*23*22*21*20/7*6*5*4*3*2*1 == 657800 different racks. Six different letters on the board gives you the most combinations: 26*25*24*23*22*21*6/(5*4*3*2*1) == 8288280. Etc.

    I leave it up to you to calculate the other numbers, but it quickly peters down. The number of different racks to consider is in the order of tens of millions, (less than 20 million I guess).

    Enumerating all possible racks, given an sufficient number of tiles is doable. And given that list and the actual distribution, you can calculate the actual odds (some will have probability, as I doubt Wordfeud has 7 Zs in the bag).

    Of course, you may want to be smart, and just derive the right number, using some combinatoric arguments, but for many people here in this forum, doing it brute-force is going to be easier.

Re: Wordfeud racks
by karlgoethebier (Curate) on Jul 31, 2013 at 09:39 UTC
Re: Wordfeud racks
by Anonymous Monk on Aug 02, 2013 at 21:43 UTC
    Thanks to all who read my question and made an effort to actually understand the problem and provided a well-considered answer. I learnt something from choroba:s code, even though it took me quite a while to understand what was going on esp. in the "$first_to_move"-section...

    I eventually ended up in a completly different direction writing the following code. (and borrowing the nCr function from here 68077 )

    Just listing the 3199724 racks takes about 6 seconds on my old laptop running Win7 32 bit. When also computing the probabilities it toke 2 minutes and 46 seconds to complete.

    /L

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1046998]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (11)
As of 2014-09-19 20:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (145 votes), past polls