No such thing as a small change PerlMonks

by Zaxo (Archbishop)
 on Mar 20, 2005 at 22:39 UTC ( #441068=note: print w/replies, xml ) Need Help??

The naive_shuffle() you object to is Fischer-Yates, which is pretty well studied. I think you misapprehend the effect of the algorithm.

While it samples a permutation of the elements, it does not do so by selecting from a list of all permutations, and execution paths have nothing to do with it.

After Compline,
Zaxo

Replies are listed 'Best First'.
by Anonymous Monk on Mar 21, 2005 at 01:43 UTC
Actually, the original "naive_shuffle" is not a Fisher-Yates shuffle, it is a "naive shuffle" implementation. The OP's analysis is correct.

His final algorithm, however, is a correct Fisher-Yates implementation.

perlfaq4: How do I shuffle an array randomly? not only gives a "canonical" implementation of the Fisher-Yates algorithm in Perl, but also refers to the List::Util module's shuffle function, which is an implementation of Fisher-Yates in C. The algorithm is also implemented in Abigail's Algorithm::Numerical::Shuffle, the doco of which gives some citations into the literature (Knuth, Fisher&Yates, etc.).

The Fisher-Yates has also been discussed many times on clpm. (You can do a Google Groups search. My ego compels me to link to this posting by yrs trly, which isn't about F-Y, but uses it.)

Yes, I was looking at the code for List::Util earlier today. In addition to the C implementatin of the Fisher-Yates shuffle, it includes the following "backup" in Perl:

```sub shuffle (@) {
my @a=\(@_);
my \$n;
my \$i=@_;
map {
\$n = rand(\$i--);
(\${\$a[\$n]}, \$a[\$n] = \$a[\$i])[0];
} @_;
}
Pretty gnarleous, IMO. Kind of like a hybrid between F-Y and Tanktalus's shuffler.

The C implementation of List::Util::shuffle is 10-20x faster than the Perl implementation. For the practical programmer: end of story. Still, to satisfy my monkly preoccupation with how many angels can lambada on the head of a pin, and more importantly, in order to reduce this dead horse to a thin protein film, I benchmarked the three shufflers: ta = Tanktalus's shuffler; lu = Perl implementation of List::Util::shuffle; rp = random_perm:

```N = 1000
Rate   rp   lu   ta
rp 230/s   -- -10% -22%
lu 257/s  12%   -- -13%
ta 296/s  29%  15%   --

N = 10_000
Rate   ta   lu   rp
ta 14.3/s   -- -27% -34%
lu 19.6/s  37%   -- -10%
rp 21.7/s  52%  11%   --

N = 100_000
Rate    ta    lu    rp
ta 0.144/s    --  -92%  -93%
lu  1.75/s 1118%    --  -13%
rp  2.01/s 1302%   15%    --

the lowliest monk

by bart (Canon) on Mar 21, 2005 at 09:16 UTC
No, you're wrong. It looks like Fisher-Yates, but there's a slight difference, in that all his array items can move again on every loop. His "correct algorithm" is actually Fisher-Yates, where per loop, one item gets moved into its final position.
Sorry Bart, I thouht you were replying to me...
No, clearly his remarks were intended for me, not you.
Bart,

That's what I said ;-)

I said the "original ... is not a Fisher-Yates..." and "The final is a correct Fisher-Yates..." - which is exactly what you are asserting now.

by tlm (Prior) on Mar 20, 2005 at 23:37 UTC

Thank you for the information. Perhaps I should have made clearer in that my encounter with this algorithm, which motivated the whole write up, was in research-oriented/scientific code in which it was being used to sample permutations of an array uniformly at random. My contention is that this is a misuse of this algorithm, because it does not sample the space of permutations uniformly.

But in light of what you write about the algorithm's standing and pedigree, the title of my meditation is an awful one. Maybe the whole node should be retracted for the sake of not confusing others. Let me know what you think.

Update: I found a version of Fisher-Yates online (linked from here):

```#include <stdlib.h>
void shuffle(int *array, size_t n) {
if (n > 1) {
size_t i;
for (i = 0; i < n - 1; i++) {                     /**/
size_t j = i + rand()/(RAND_MAX / (n - i) + 1); /**/
int t = array[j];
array[j] = array[i];
array[i] = t;
}
}
}
This is equivalent to the algorithm used by my random_perm, not the one used by naive_shuffle. To get the latter algorithm, the lines indicated with /**/ above would have to be changed to:
```  for (i = 0; i < n; i++) {
size_t j = rand()/(RAND_MAX / n);
the lowliest monk

No, don't withdraw it. I think the thing to do is try and figure out the difference between a "fair shuffle" and a uniformly distributed selection over permutations.

Are the two the same if there are identical cards in the deck? What is the problem that motivated this?

Sometimes confusion, like greed, is good.

After Compline,
Zaxo

Create A New User
Node Status?
node history
Node Type: note [id://441068]
help
Chatterbox?
 [SwaJime]: Ran into an interesting piece of code today ... [SwaJime]: select name,status, status2 from __ where status & status < 13 and status2 & 16 <> 16 [SwaJime]: I'm thinking that looks messed up. Sorry for OT ... but I know y'all are smart :-) [SwaJime]: I expected items to not be selected if status was 0 ... but they were selected [SwaJime]: any thoughts?

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2017-08-22 17:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Who is your favorite scientist and why?

Results (338 votes). Check out past polls.

Notices?