Hello..
I'm writing a paper wherein array randomize there. I'm looking for Fisher-Yates shuffle on the net and many links only give me the FY code. Can anyone help me to enlighten 'bout FY shuffle theory? Thanks...
Re: Fisher-Yates theory
by MarkM (Curate) on Jul 24, 2003 at 05:39 UTC
|
In addition to what sauoq wrote:
I'm not convinced that the Fisher-Yates shuffle actually has very much theory with regard to random order involved at all.
On the surface, each array entry has the appearance of being given a single opportunity at being swapped with another array entry. In practice, array entries nearer to the beginning of the array have additional opportunities of being swapped (as a result of later swaps), meaning that less shuffling happens at the end of the array than at the beginning.
One could argue in great length whether additional shuffling in the array improves the randomness, or whether it produces patterns in the longer term, but it isn't difficult to argue that regardless of which conclusion is valid, the fact that the entries nearer to the beginning have additional chances of being shuffled, while entries nearer the end have fewer chances, that *one* of these ends will be improved more than the other, therefore the algorithm is not optimal.
UPDATE: See node Fisher-Yates theory... does this prove that it is invalid? for my 'proof' that Fisher-Yates weights the results.
| [reply] |
|
I'm not convinced that the Fisher-Yates shuffle actually has very much theory with regard to random order involved
at all.
Well, you are wrong. The first publication of the Fisher-Yates
shuffle dates from 1938. It's example 12 in the book
Statistical Tables, whose authors are R.A. Fisher
and F.Yates.
It was also discussed by R. Durstenfeld, in 1964.
Communications of the ACM, volume 7, pp 420.
And of course Donald Knuth mentions it, in Volume 2 of
The Art of Computer Programming. In the third
edition, it's algorithm P, section 3.4.2 on page 145.
Abigail
| [reply] |
|
++
If there was ever a node that deserves to make the Best Nodes, that's it...
I bookmarked it so I can read that article from ACM's digital library.
UPDATE: I read the paragraph in the ACM digital library, for algorithm 235. How did you ever find that citation? It is the algorithm, but he doesn't credit Fisher-Yates...
--
Mike
| [reply] |
|
|
|
UPDATE: As pointed out by others, I made an error when translating the code. See their summaries for good explanations. Cheers, and thanks everyone. (tail between legs)
In a previous article, I was challenged for doubting the effectiveness of the Fisher-Yates shuffle as described in perlfaq.
Below, I have written code that exhausts all possible random sequences that could be used during a particular Fisher-Yates shuffle. Statistically, this should be valid, as before the shuffle begins, there is an equal chance that the random sequence generated could be 0 0 0 0 0 as 0 1 2 3 4 as 4 4 4 4 4. By exhaustively executing the Fisher-Yates shuffle, and calculating the total number of occurrences that each result set is produced, we can determine whether the Fisher-Yates shuffle has the side effect of weighting the results, or whether the shuffle is truly random, in that it should be approximately well spread out.
my $depth = 5;
my %results;
sub recurse
{
if (@_ >= $depth) {
my @deck = (1 .. $depth);
shuffle(\@deck, [@_]);
$results{join('', @deck)}++;
} else {
recurse(@_, $_) for 0 .. ($depth-1);
}
}
sub shuffle
{
my($deck, $rand) = @_;
my $i = @$deck;
while ($i--) {
my $j = shift @$rand;
@$deck[$i,$j] = @$deck[$j,$i];
}
}
recurse;
for (sort {$results{$b} <=> $results{$a}} keys %results) {
printf "%10d %s\n", $results{$_}, $_;
}
With the above code, I was able to determine that with a deck size of 5, and an initial set of 1 2 3 4 5, there is three times the probability that the resulting set will be 3 1 2 5 4 than the probability that the resulting set will be 2 3 4 5 1. To me, this indicates that this theory is flawed.
If anybody needs to prove to themselves that the test is exhaustive, print out "@$rand" in the shuffle subroutine.
Please analyze the code carefully, pull out your school books, and see if I have made a mistake.
Cheers,
mark | [reply] [d/l] |
|
|
|
|
Again with the "you are wrong." Why the acerbic and argumentative tone? Up to this point, this appears to be a friendly conversation about their views on the theory, which was informative and interesting.
If someone says they're not convinced, you can't disprove that. One may heap on additional evidence to attempt to convince them, but they may continue to be unconvinced for rational or irrational reasons.
This site often discusses matters of faith. There's no right or wrong about being convinced. "You are wrong" demarks an absolutism, both in passive grammar and in connotation. I prefer a community which demonstrates respect for others' faith and others' opinion.
Fisher-Yates may have been proven effective by way of analyzing the results for uniform potential entropy. It may have been proven by logical analysis of the probabilities on each successive swap. Sharing that evidence is helpful, but I ask politely for everyone to build a constructive community, not one which promotes controversy and arrogance.
-- [ e d @ h a l l e y . c c ]
| [reply] |
|
|
On the surface, each array entry has the appearance of being given a single opportunity at being swapped with another array entry. In practice, array entries nearer to the beginning of the array have additional opportunities of being swapped (as a result of later swaps), meaning that less shuffling happens at the end of the array than at the beginning.
That is what intuition says, but in this case intuition falls short of reality. A Fisher-Yates shuffle avoids (not introduces) bias by making the pool smaller for each iteration. There are n! possible permutations of a set of n items. After the first iteration, an F-Y shuffle gives n possible outcomes, each equally likely. The second iteration yields (n - 1) for each of the n possible outcomes, leaving us with n*(n-1) possibilities - again equally likely. Follow that to its conclusion, you get n(n-1)(n-2)...1 possibilities, each equally likely. For an example take a 3 item set. There are 3! (= 6) possible permutations of this set if it is shuffled. The first iteration of the loop, there are three possibilities: a-(1 2 3), b-(2 1 3), and c-(3 2 1). The second iteration only swaps the 2nd and 3rd elements, so for a you have an equal possibility of (1 2 3) and (1 3 2); for b - (2 1 3) and (2 3 1); for c - (3 2 1) and (3 1 2). None of the possibilities are duplicated, each one has a 1/6 chance of being selected.
Iter 1 - Iter 2
(1 2 3) -> (1 2 3)
-> (1 3 2)
(1 2 3) -> (2 1 3) -> (2 1 3)
-> (2 3 1)
(3 2 1) -> (3 2 1)
-> (3 1 2)
Six possibilities, each equally likely.
Another way to look at it is this: The first element has a 2/3 chance of getting swapped the first time and a 1/2 chance the second - giving it a 1/2 * 2/3 = 1/3 chance of ending up in any given slot. Update: Finally, Re: Re: Re: random elements from array - new twist shows a statistical analysis of a Fisher-Yates shuffle. Whew, I'm done. I hope this wasn't homework - or if it was, Anonymous Monk learned something ;-) | [reply] [d/l] |
|
One could argue in great length whether additional shuffling in the array improves the randomness, or whether it produces patterns in the longer term,
If your RNG was perfect, additional shuffles would neither improve or degrade the randomness.
but it isn't difficult to argue that regardless of which conclusion is valid, the fact that the entries nearer to the beginning have additional chances of being shuffled, while entries nearer the end have fewer chances, that *one* of these ends will be improved more than the other, therefore the algorithm is not optimal.
Unless neither conclusion is valid (see above) in which case a single shuffle is no better or worse than multiple shuffles and the algorithm is optimal.
In practice, however, our RNGs aren't perfect and you may have a point. I suppose you could resolve it by keeping track of the original indexes of the elements and then apply your map and do a second FY shuffle starting with the previous last and moving backwards to the previous first... still linear in time and space... but I doubt it is worth it. :-)
-sauoq
"My two cents aren't worth a dime.";
| [reply] |
|
The biggest problem with shuffling algorithms (not just
Fisher-Yates) is the imperfectness of the RNG. The sequence
of numbers it returns is determined by the seed, and only
by the seed. And if from the seed only 32 bits are being
used, there will be at most 2**32 different sequences.
Which means that for a deck of more than 2 cards, Fisher-Yates
cannot be 'fair', as N! won't be a divisor of any
power of 2. Furthermore, 13! > 2**32, so even if
you shuffle a deck of 13 cards, some permutations will
*never* be selected, no matter how many times you perform
the shuffle.
This was pointed out by R. Salfi: COMPSTAT 1974,
Vienna: 1974, pp 28 - 35.
See also the documentation of the Shuffle module on CPAN.
Abigail
| [reply] |
|
One could argue in great length whether additional shuffling in the array improves the randomness, or whether it produces patterns in the longer term,
If your RNG was perfect, additional shuffles would neither improve or degrade the randomness.
It seems that additional "shuffling" is required to prevent
a permutation of the array where there is a correlation
between items that are swapped.
It seems the "additional shuffling" is required.
Consider the shuffling of an array of three items without
the so called "additional shuffling": Starting from the
left:
Initial: abc
1 move: abc bac(1/3) cba(1/3)
2 move: abc(1/6) acb(1/6)
Updated: Per Abigail's criticism of a misstyped table
and obscurity. To clarify:
I take issue with:
On the surface, each array entry has the appearance of being given a single opportunity at being swapped with another array entry. In practice, array entries nearer to the beginning of the array have additional opportunities of being swapped (as a result of later swaps), meaning that less shuffling happens at the end of the array than at the beginning.
One could argue in great length whether additional shuffling in the array improves the randomness, or whether it produces patterns in the longer term, but it isn't difficult to argue that regardless of which conclusion is valid, the fact that the entries nearer to the beginning have additional chances of being shuffled, while entries nearer the end have fewer chances, that *one* of these ends will be improved more than the other, therefore the algorithm is not optimal.
Which seems to say that the possible movement of an already
moved item is a weakness of the algorithm. But an array cannot
be randomly shuffled in place without allowing elements to
be moved more than once.
The idea that the errors of a RNG will more adversely affect
the items to which that RNG is more often applied is an
argument that is more subtle. I reject it out of hand.
If the RNG is incorrect the results will not be random and
we are quibling about how obvious the error is.
This distinction may be very important in practice and there
are practical solutions to get a pseudo-randomness that is
okay for varying definitions of okay.
| [reply] [d/l] |
|
|
IMHO the definition of a "good" shuffle algorithm is that starting from a sequence, each of its permutations has the same probability to be the outcome of the shuffle on that starting sequence.
Many "home grown" algorithms violate this criterion and hence are to be considered, well, bad shuffle algorithms. Fisher & Yates satisfies this criterion however (the proof is left as "homework" to the original poster).
You can have a look at Algorithm::Numerical::Shuffle for references as well as a short discussion.
Just my 2 cents, -gjb-
| [reply] |
Re: Fisher-Yates theory
by sauoq (Abbot) on Jul 24, 2003 at 05:09 UTC
|
sub fisher_yates_shuffle {
my $deck = shift; # $deck is a reference to an array
my $i = @$deck;
while ($i--) {
my $j = int rand ($i+1);
@$deck[$i,$j] = @$deck[$j,$i];
}
}
Update: Uh. Gee... you did ask for the theory not the code. Sorry. What is there to tell you? It's O(N) for obvious reasons and it works on the data in place so memory usage is minimal. It's a very straightforward algorithm.
-sauoq
"My two cents aren't worth a dime.";
| [reply] [d/l] |
|
| [reply] |
Re: Fisher-Yates theory
by Skeeve (Parson) on Jul 24, 2003 at 05:58 UTC
|
| [reply] |
|
I consider algorithm related question, which can be used in Perl should be allright on this forum, if not in SOPW.
artist
| [reply] |
|
| [reply] |
Re: Fisher-Yates theory
by bart (Canon) on Aug 06, 2003 at 12:47 UTC
|
The Fisher-Yates shuffle is an algorithm I came up with myself, independent of any previous work... So I'll give you my view on it, can I?
As and example take a deck of cards, 52 of them. Pick any card. You'll agree that the card chosen is completely random, no? No preference towards the old order? OK. Now the most important step: take it out of the deck. That's your first card of the new, shuffled deck.
Now you have a deck of 51 cards left. Pick one, any one... That's your second card, completely random. Now you have a deck of 50 cards left...
Repeat, taking one random card out of the deck each time, and adding it to the new deck, until you're left with an empty deck, with no cards left. Your new deck now contains all cards and is shuffled completely randomly.
As an optimisation for the implementation, you can see that the total number of cards in both decks is always the same. So instead of physically creating two decks, we take a paper bookmark and insert it between the old and the new deck. We move the chosen card out of the old deck and into the new deck, simply by swapping the chosen card and the former first card of the old deck, and then moving the bookmark one position up so that it's now the last card of the new deck.
Since the order of the new deck is completely independent of the order in the old deck, doing it this way, changing the order of the old deck while you're at it, doesn't influence the statistical properties of the end result, at all. | [reply] |
|
|