Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: A bad shuffle

by Zaxo (Archbishop)
on Mar 20, 2005 at 22:39 UTC ( [id://441068]=note: print w/replies, xml ) Need Help??


in reply to A bad shuffle

The naive_shuffle() you object to is Fischer-Yates, which is pretty well studied. I think you misapprehend the effect of the algorithm.

While it samples a permutation of the elements, it does not do so by selecting from a list of all permutations, and execution paths have nothing to do with it.

After Compline,
Zaxo

Replies are listed 'Best First'.
Re^2: A bad shuffle
by Anonymous Monk on Mar 21, 2005 at 01:43 UTC
    Actually, the original "naive_shuffle" is not a Fisher-Yates shuffle, it is a "naive shuffle" implementation. The OP's analysis is correct.

    His final algorithm, however, is a correct Fisher-Yates implementation.

      perlfaq4: How do I shuffle an array randomly? not only gives a "canonical" implementation of the Fisher-Yates algorithm in Perl, but also refers to the List::Util module's shuffle function, which is an implementation of Fisher-Yates in C. The algorithm is also implemented in Abigail's Algorithm::Numerical::Shuffle, the doco of which gives some citations into the literature (Knuth, Fisher&Yates, etc.).

      The Fisher-Yates has also been discussed many times on clpm. (You can do a Google Groups search. My ego compels me to link to this posting by yrs trly, which isn't about F-Y, but uses it.)

        Yes, I was looking at the code for List::Util earlier today. In addition to the C implementatin of the Fisher-Yates shuffle, it includes the following "backup" in Perl:

        sub shuffle (@) { my @a=\(@_); my $n; my $i=@_; map { $n = rand($i--); (${$a[$n]}, $a[$n] = $a[$i])[0]; } @_; }
        Pretty gnarleous, IMO. Kind of like a hybrid between F-Y and Tanktalus's shuffler.

        The C implementation of List::Util::shuffle is 10-20x faster than the Perl implementation. For the practical programmer: end of story. Still, to satisfy my monkly preoccupation with how many angels can lambada on the head of a pin, and more importantly, in order to reduce this dead horse to a thin protein film, I benchmarked the three shufflers: ta = Tanktalus's shuffler; lu = Perl implementation of List::Util::shuffle; rp = random_perm:

        N = 1000 Rate rp lu ta rp 230/s -- -10% -22% lu 257/s 12% -- -13% ta 296/s 29% 15% -- N = 10_000 Rate ta lu rp ta 14.3/s -- -27% -34% lu 19.6/s 37% -- -10% rp 21.7/s 52% 11% -- N = 100_000 Rate ta lu rp ta 0.144/s -- -92% -93% lu 1.75/s 1118% -- -13% rp 2.01/s 1302% 15% --

        the lowliest monk

Re^2: A bad shuffle
by bart (Canon) on Mar 21, 2005 at 09:16 UTC
    No, you're wrong. It looks like Fisher-Yates, but there's a slight difference, in that all his array items can move again on every loop. His "correct algorithm" is actually Fisher-Yates, where per loop, one item gets moved into its final position.
      Sorry Bart, I thouht you were replying to me...
        No, clearly his remarks were intended for me, not you.
      Bart,

      That's what I said ;-)

      Please re-read the node you are replying to.

      I said the "original ... is not a Fisher-Yates..." and "The final is a correct Fisher-Yates..." - which is exactly what you are asserting now.

Re^2: A bad shuffle
by tlm (Prior) on Mar 20, 2005 at 23:37 UTC

    Thank you for the information. Perhaps I should have made clearer in that my encounter with this algorithm, which motivated the whole write up, was in research-oriented/scientific code in which it was being used to sample permutations of an array uniformly at random. My contention is that this is a misuse of this algorithm, because it does not sample the space of permutations uniformly.

    But in light of what you write about the algorithm's standing and pedigree, the title of my meditation is an awful one. Maybe the whole node should be retracted for the sake of not confusing others. Let me know what you think.


    Update: I found a version of Fisher-Yates online (linked from here):

    #include <stdlib.h> void shuffle(int *array, size_t n) { if (n > 1) { size_t i; for (i = 0; i < n - 1; i++) { /**/ size_t j = i + rand()/(RAND_MAX / (n - i) + 1); /**/ int t = array[j]; array[j] = array[i]; array[i] = t; } } }
    This is equivalent to the algorithm used by my random_perm, not the one used by naive_shuffle. To get the latter algorithm, the lines indicated with /**/ above would have to be changed to:
    for (i = 0; i < n; i++) { size_t j = rand()/(RAND_MAX / n);
    the lowliest monk

      No, don't withdraw it. I think the thing to do is try and figure out the difference between a "fair shuffle" and a uniformly distributed selection over permutations.

      Are the two the same if there are identical cards in the deck? What is the problem that motivated this?

      Sometimes confusion, like greed, is good.

      After Compline,
      Zaxo

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://441068]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-24 06:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found