Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
We don't bite newbies here... much
 
PerlMonks  

Fisher-Yates theory... does this prove that it is invalid?

by MarkM (Curate)
on Jul 25, 2003 at 01:00 UTC ( #277764=note: print w/ replies, xml ) Need Help??


in reply to Re: Fisher-Yates theory
in thread Fisher-Yates theory

UPDATE: As pointed out by others, I made an error when translating the code. See their summaries for good explanations. Cheers, and thanks everyone. (tail between legs)

In a previous article, I was challenged for doubting the effectiveness of the Fisher-Yates shuffle as described in perlfaq.

Below, I have written code that exhausts all possible random sequences that could be used during a particular Fisher-Yates shuffle. Statistically, this should be valid, as before the shuffle begins, there is an equal chance that the random sequence generated could be 0 0 0 0 0 as 0 1 2 3 4 as 4 4 4 4 4. By exhaustively executing the Fisher-Yates shuffle, and calculating the total number of occurrences that each result set is produced, we can determine whether the Fisher-Yates shuffle has the side effect of weighting the results, or whether the shuffle is truly random, in that it should be approximately well spread out.

my $depth = 5; my %results; sub recurse { if (@_ >= $depth) { my @deck = (1 .. $depth); shuffle(\@deck, [@_]); $results{join('', @deck)}++; } else { recurse(@_, $_) for 0 .. ($depth-1); } } sub shuffle { my($deck, $rand) = @_; my $i = @$deck; while ($i--) { my $j = shift @$rand; @$deck[$i,$j] = @$deck[$j,$i]; } } recurse; for (sort {$results{$b} <=> $results{$a}} keys %results) { printf "%10d %s\n", $results{$_}, $_; }

With the above code, I was able to determine that with a deck size of 5, and an initial set of 1 2 3 4 5, there is three times the probability that the resulting set will be 3 1 2 5 4 than the probability that the resulting set will be 2 3 4 5 1. To me, this indicates that this theory is flawed.

If anybody needs to prove to themselves that the test is exhaustive, print out "@$rand" in the shuffle subroutine.

Please analyze the code carefully, pull out your school books, and see if I have made a mistake.

Cheers,
mark


Comment on Fisher-Yates theory... does this prove that it is invalid?
Download Code
Re: Fisher-Yates theory... does this prove that it is invalid?
by jsprat (Curate) on Jul 25, 2003 at 01:51 UTC
    Hi MarkM,

    That algorithm shows the possible results of a biased shuffle, not a Fisher-Yates shuffle. The random sequence generated would not be 00000 to 44444, it would be 0000 to 4321 (a five digit shuffle requires 4 iterations - the faq goes 5, but the last never swaps - with each iteration shuffling one less item).

    The while loop in shuffle needs one less iteration, and a minor adjustment to recurse would look like this:

    #!/usr/bin/perl use strict; use warnings; my $depth = 4; my %results; sub recurse { if (@_ == $depth) { shift; #discard $num my @deck = (1 .. $depth); shuffle(\@deck, [@_]); $results{join('', @deck)}++; } else { my $num = shift || $depth - 1; # one less element each iteration recurse($num, @_, $_) for 0 .. $num--; } } sub shuffle { my($deck, $rand) = @_; my $i = @$deck; # uncomment the following line # print "@$rand\n"; # pre-decrement $i instead of post - the last would be a no-op in +this case while (--$i) { my $j = shift @$rand; @$deck[$i,$j] = @$deck[$j,$i]; } } recurse; for (sort {$results{$b} <=> $results{$a}} keys %results) { printf "%10d %s\n", $results{$_}, $_; }

    Here are the results of the modifications, using 4 elements instead of 5 (only 24 possible permutations instead of 120 - makes the node much more readable ;):

    Each possible permutation is shown exactly one time, for a possibility of being selected 1 out of 24 times (assuming a perfect rng).

    Makes sense???

    Update: I followed BrowserUK's link below and in that thread there is a statement that elegantly describes the problem with a biased shuffle (When the Best Solution Isn't), by blakem: "It maps 8 paths to 6 end states". In this case, it's 3125 (5**5) paths to 120 (5!) end states - assuming 5 elements to be shuffled.

Re: Fisher-Yates theory... does this prove that it is invalid?
by adrianh (Chancellor) on Jul 25, 2003 at 01:58 UTC
    Please analyze the code carefully, pull out your school books, and see if I have made a mistake.

    Yes you have. You're not emulating a Fisher-Yates shuffle :-)

    Consider the original code from perlfaq:

    sub fisher_yates_shuffle { my $deck = shift; # $deck is a reference to an array my $i = @$deck; while ($i--) { print $i, "\n"; my $j = int rand ($i+1); @$deck[$i,$j] = @$deck[$j,$i]; } }

    Note how $i is decremented on each iteration. Consider how that alters the sequence of possible indices.

    Once you take that into account you get the textbook behaviour.

    sub fixed_fisher_yates_shuffle { my ($deck, $rand) = @_; my $i = @$deck; while ($i--) { my $j = shift @$rand; @$deck[$i,$j] = @$deck[$j,$i]; } } use Set::CrossProduct; my $i = Set::CrossProduct->new([ [0..4], [0..3], [0..2], [0..1], [0] ] +); my %count; while (my $a = $i->get) { print "@$a : "; my @foo = (1,2,3,4,5); fixed_fisher_yates_shuffle(\@foo, $a); print "@foo\n"; $count{"@foo"}++; }; foreach my $key (sort keys %count) { print $key, " = ", $count{$key}, "\n"; };
Re: Fisher-Yates theory... does this prove that it is invalid?
by BrowserUk (Pope) on Jul 25, 2003 at 02:04 UTC

    The problem is, your shuffle routine is not an implementation of a Fisher-Yates shuffle.

    This line

    my $j = shift @$rand;

    is no way equivalent to this line from the FAQ implementation

    my $j = int rand ($i+1);

    The latter picks a swap partner for the current value of $i, randomly between 0 and $i-1. I can't quite wrap my brain around what your code is doing here, but it isn't even vaguely equivalent.

    Therefore you are not testing a Fisher-Yates shuffle, but some shuffle algorithm of your own invention, which you succeed in proving isn't as good as the Fisher-Yates.

    You might find this post Re: When the Best Solution Isn't that does a statistical analysis of several shuffle routines, a Fisher-Yates amongst them, including frequency and standard deviation interesting.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://277764]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2014-04-21 11:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (493 votes), past polls