Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re: A bad shuffle

by Tanktalus (Canon)
on Mar 20, 2005 at 21:36 UTC ( #441062=note: print w/replies, xml ) Need Help??

in reply to A bad shuffle

Your stats knowledge/understanding seems to be way above mine ... but wouldn't this be just as good:

sub shuffle { my @in = @_; my @out; push @out, splice(@in, rand @in, 1) while @in; @out; }
? I always get kinda concerned about swapping, I don't know why, when a simple random ordering would suffice. But, I didn't really do that well in stats in university, and that was altogether too long ago to remember anyway. :-) Comparing speeds ...
use strict; use Benchmark qw(cmpthese); my @cards = 1..52; cmpthese(-1, { shuffle => sub { shuffle(@cards) }, random_perm => sub { random_perm(@cards) }, }); sub random_perm { my $n = my @p = @_; for ( my $i = $#p; $i > 0; --$i ) { my $j = int rand( $i + 1 ); @p[ $i, $j ] = @p[ $j, $i ]; } return @p; } sub shuffle { my @in = @_; my @out; push @out, splice(@in, rand @in, 1) while @in; @out; }
which gives:
Rate random_perm shuffle random_perm 6000/s -- -30% shuffle 8606/s 43% --
which says that if random_perm is the optimal pseudo-random shuffler, then my code must be suboptimal in that it's skipping something in making it random. For the life of me, I can't think of what.

Update: Changed the benchmark to actually test a real shuffle (woops) - thanks to anonymonk below. Anonymonk points out that this doesn't scale to 52000, which is quite valid for those who need to sort "large" amounts. For those who are only sorting a deck of cards (a very popular subset of shuffling), this may be sufficient ;-)

Replies are listed 'Best First'.
Re^2: A bad shuffle
by tlm (Prior) on Mar 20, 2005 at 23:04 UTC

    Your algorithm looks perfectly fine to me. random_perm is optimal in the limited sense that it achieves a random permutation in place with the miniminum number of swaps. (You can't do this problem in place without swapping.) It is also nice in that its implementation varies little from language to language, and in that respect it is pretty general. In contrast, to implement the algorithm like the one you posted in C is not straightforward; Perl does a lot behind the scenes to allow one to write splice(@in, rand @in, 1) while @in.

    That said, I confess that I am quite surprised by the benchmarks. Then again, this is not by any means the first time that intuitions from C programming as to the efficiencies of various operations turn out to be waaaay off when the problem is translated to Perl. Your post is a good reminder to benchmark, benchmark, benchmark. And just to act on this reminder, I decided to benchmark against your original a version of shuffle that omits the last call to splice:

    sub shuffle_2 { my @in = @_; my @out; push @out, splice(@in, rand @in, 1) while @in > 1; push @out, @in; @out; }
    And once again, my intuition is way off:
    Rate shuffle_2 shuffle shuffle_2 5856/s -- -5% shuffle 6140/s 5% --

    BTW, I should clarify that I don't recall the real name of the algorithm used by random_perm, if it actually has a name.

    Update: As pointed out by several others, the algorithm used by random_perm goes by the name of Fisher-Yates. I've also found it referred to as "Knuth's shuffle."

    the lowliest monk

Re^2: A bad shuffle
by Anonymous Monk on Mar 21, 2005 at 10:04 UTC
    Considering that shuffle and random_perm are called without arguments, and are hence shuffling empty lists, your benchmark doesn't show anything interesting. If you change the @_ in both subs to @cards, I get the following results:
                  Rate random_perm     shuffle
    random_perm 6796/s          --        -32%
    shuffle     9955/s         46%          --
    It may appear that shuffle is faster. But look what happens when we shuffle 52000 cards instead of 52:
                s/iter     shuffle random_perm
    shuffle       1.02          --        -85%
    random_perm  0.158        550%          --
    It's well known that a shuffle based on splice doesn't scale, due to its quadratic behaviour.

    Never use a splice based shuffle. For every millisecond you will gain if you shuffle a small list you'll pay a second when shuffling a large list.

Re^2: A bad shuffle
by Anonymous Monk on Mar 22, 2005 at 09:55 UTC
    For those who are only sorting a deck of cards (a very popular subset of shuffling), this may be sufficient ;-)
    Well, about all you gain is 50 microseconds. The 'shuffle' sub isn't significant smaller or easier to understand than 'random_perm'. They are both four lines of code, not counting lines consisting of just closing braces.

    There isn't a real argument to not choose a solution that scales. If 50 microseconds are important to you, you should have used List::Util::scalar in the first place.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://441062]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2017-06-25 00:19 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (563 votes). Check out past polls.