Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Benchmarking perfect shuffles

by ambrus (Abbot)
on Feb 28, 2006 at 15:51 UTC ( #533396=perlmeditation: print w/ replies, xml ) Need Help??

In this meditation, I'm benchmarking a couple of subroutines to uniformly shuffle an array.

I know others have done this too, for exapmle see Re^2: Randomising the order of an array. However, I've learnt much from doing this myself.

First, let's see what shuffling functions I was using. (I originally posted these at Re^2: Array Shuffle. I've removed the bless method, as it's very slow (probably because it has to numerify two strings in every comparision).)

I used six different algorithms. One of these just calls the shuffle function from List::Util, which is usually implemented as an XS functions, so it can be expected that this would be the fastest.

Two of the algorithms assign a random number to each element in the array, and sort the array using that number as a key. One of these uses a schwartzian transform, the other one uses a single array to store the keys.

Note that these sorting algorithms only give a truly uniform random permutation if the rand() function of perl gives enough random bits. This is because of the birthday-paradox, see chapter 5.3 of the Cormen – Leiserson – Rivest – Stein book for more details. I was using linux, so perl uses the drand48 function from glibc, which gives 48 good random bits, so this doesn't affect me except for large arrays. However, on at least some windows systems, the rand perl function probably calls the C rand function only once, and that gives only 15 random bits which is too few for even small arrays.

The other three algorithms used the classical swapping algorithm: take each element in the array starting from the first, and swap it with one of the elements after it (including itself). I've made three variants of this. One is the classical swap, one uses map to collect the swapped-back data, and one uses shift to remove elements from the front of the array instead. (Using shift was not my idea, someone on the cb recommended it earlier, but I can't remember who he was, sorry.)

On this algorithm, see both the above mentioned chapter in the Cormen book, or chapter 3.4.2 of Knuth's TAOCP.

Here's the code I was using.

use warnings; use strict; use Benchmark ":all"; use IO::Handle; my %shuffle; use List::Util; $shuffle{"xs"} = \&List::Util::shuffle; $shuffle{"swap"} = sub { my @p = @_; my $k; $k = $_ + int(rand(@p - $_)), @p[$_, $k] = @p[$k, $_] for 0 .. @p - 1; @p; }; $shuffle{"map"} = sub { my @p = @_; map { splice @p, $_ + rand(@p - $_), 1, $p[$_] } 0 .. @p - 1; }; $shuffle{"shift"} = sub { my @p = @_; my $t; map { $t = splice @p, rand(@p), 1, $p[0]; shift @p; $t } 0 .. +@p - 1; }; $shuffle{"weight"} = sub { my @w = map { rand } @_; @_[sort { $w[$a] <=> $w[$b] } 0 .. @_ - 1]; }; $shuffle{"schwartzian"} = sub { map { $$_[1] } sort { $$a[0] <=> $$b[0] } map { [rand, $_] } @ +_; }; if (0) { $shuffle{"bless"} = sub { map { $$_[0] } sort { ref $a <=> ref $b } map { bless [$_], ra +nd } @_; }; } if (0) { my $N = 0 + ($ARGV[0] || 20); my $T = 0 + ($ARGV[1] || 1000); for my $n (keys(%shuffle)) { my %f; print $n; my $a = $shuffle{$n}; for (1 .. $T) { my @a = 1 .. $N; $f{join(" ", &$a(@a))}++; } for (sort keys(%f)) { printf "%5d: %s\n", $f{$_}, $_; } print 0+keys(%f), "\n\n"; } } my @sd; my %shtest = map { my($n, $a) = ($_, $shuffle{$_}); $n, sub { my @r = &$a(@sd); } } keys(%shuffle); for my $N (map { 1 + 2 * 3**$_ } 0 .. 8) { print "testing on $N elements\n"; @sd = my @sd0 = map { rand } 1 .. $N; cmpthese(-5, \%shtest); join(";", @sd) eq join(";", @sd0) or die "error: some of the shuffle functions have changed the arr +ay"; flush STDOUT; } print "done\n"; __END__

And here's the output, using v5.8.8 on i686-linux.

testing on 3 elements Rate schwartzian swap weight shift ma +p xs schwartzian 131923/s -- -1% -15% -20% -23 +% -88% swap 133815/s 1% -- -13% -19% -22 +% -88% weight 154592/s 17% 16% -- -6% -10 +% -86% shift 165196/s 25% 23% 7% -- -4 +% -86% map 172130/s 30% 29% 11% 4% - +- -85% xs 1140464/s 764% 752% 638% 590% 563 +% -- testing on 7 elements Rate schwartzian weight swap shift map + xs schwartzian 61200/s -- -15% -20% -27% -29% + -90% weight 72048/s 18% -- -5% -13% -17% + -88% swap 76058/s 24% 6% -- -9% -12% + -87% shift 83283/s 36% 16% 10% -- -4% + -86% map 86718/s 42% 20% 14% 4% -- + -85% xs 592396/s 868% 722% 679% 611% 583% + -- testing on 19 elements Rate schwartzian weight swap shift map + xs schwartzian 21339/s -- -15% -32% -33% -43% + -91% weight 25214/s 18% -- -20% -21% -32% + -90% swap 31452/s 47% 25% -- -2% -16% + -87% shift 31957/s 50% 27% 2% -- -14% + -87% map 37239/s 75% 48% 18% 17% -- + -85% xs 243980/s 1043% 868% 676% 663% 555% + -- testing on 55 elements Rate schwartzian weight swap shift map + xs schwartzian 6790/s -- -17% -41% -42% -49% + -92% weight 8154/s 20% -- -29% -30% -39% + -90% swap 11492/s 69% 41% -- -1% -14% + -86% shift 11650/s 72% 43% 1% -- -13% + -86% map 13400/s 97% 64% 17% 15% -- + -84% xs 83719/s 1133% 927% 628% 619% 525% + -- testing on 163 elements Rate schwartzian weight shift swap map + xs schwartzian 1902/s -- -19% -52% -52% -58% + -93% weight 2356/s 24% -- -41% -41% -48% + -92% shift 3982/s 109% 69% -- -0% -13% + -86% swap 3998/s 110% 70% 0% -- -12% + -86% map 4559/s 140% 93% 14% 14% -- + -84% xs 28869/s 1418% 1125% 625% 622% 533% + -- testing on 487 elements Rate schwartzian weight shift swap map + xs schwartzian 515/s -- -20% -60% -61% -65% + -95% weight 644/s 25% -- -50% -51% -56% + -93% shift 1292/s 151% 101% -- -2% -12% + -86% swap 1321/s 157% 105% 2% -- -10% + -86% map 1467/s 185% 128% 14% 11% -- + -84% xs 9413/s 1728% 1363% 629% 612% 542% + -- testing on 1459 elements Rate schwartzian weight shift swap map + xs schwartzian 121/s -- -28% -69% -70% -71% + -95% weight 168/s 39% -- -57% -58% -59% + -94% shift 396/s 226% 135% -- -1% -4% + -85% swap 401/s 230% 138% 1% -- -3% + -85% map 414/s 241% 146% 4% 3% -- + -84% xs 2656/s 2088% 1477% 570% 563% 542% + -- testing on 4375 elements Rate schwartzian weight map shift swap + xs schwartzian 20.8/s -- -33% -71% -75% -76% + -94% weight 31.1/s 50% -- -57% -63% -64% + -92% map 72.6/s 249% 133% -- -14% -16% + -81% shift 84.7/s 307% 172% 17% -- -2% + -77% swap 86.6/s 317% 178% 19% 2% -- + -77% xs 376/s 1709% 1108% 418% 344% 334% + -- testing on 13123 elements Rate schwartzian weight map shift swap + xs schwartzian 4.46/s -- -28% -72% -73% -76% + -95% weight 6.18/s 39% -- -61% -63% -67% + -92% map 15.7/s 252% 154% -- -6% -16% + -81% shift 16.8/s 277% 172% 7% -- -10% + -80% swap 18.7/s 319% 203% 19% 11% -- + -77% xs 82.0/s 1739% 1227% 422% 388% 339% + -- done

Ok, so what can we learn from all this output?

As we have expected, the XS function was much faster than any other method.

The next thing we can see is that the sorting solutions (schwartzian and weight) were slower than any of the combinatorical ones, and that this disadvantage is even more significant with larger arrays. This is sort of logical, after all, sorting takes O(n·log n) time but the other algorithms take only O(n).

We can also see that the schwartzian method is slower than the one using only one array. This was surprising to me when I first met it, but then tye and Aristotle explained to me in the replies of Re: Benchmark, -s versus schwartzian why this has to be like this. (Here, I don't implement the hash lookup approach shown in that thread, only the array lookup one.)

I also wonder if it's possible to speed up this method by using the variant of the schwartzian transform where you map the data to strings that you can sort with one of the built-in sorting functions, see Re^3: Benchmark, -s versus schwartzian (vs hash) or •Re^2: Benchmark, -s versus schwartzian or fast, flexible, stable sort. (Apparently that's called GRT. Can anyone tell what that stands for? (Update: bart says it's the Guttman-Rosler transform, see also this slide.)) Maybe I'll try that later.

Now let's examine the faster, combinatorical solutions (swap, map, shift).

All three solutions have about the same speed. The (fortran-like) swapping solution is the fastest for larger arrays (from say 4K elements), but the other two are faster for small arrays. This is probably because it doesn't have to do memory managment. Finally, the map solution is consistently faster than the shift solution, but only by a small amount.

Comment on Benchmarking perfect shuffles
Select or Download Code
Re: Benchmarking perfect shuffles
by jdhedden (Deacon) on Feb 28, 2006 at 18:50 UTC
    Just for the heck of it, I decided to test the suffle() function found in Math::Random::MT::Auto. It, too, is an XS implementation like List::Util's shuffle(). I was happily surprised to see that for anything more than a few dozen items in the array, MRMA's shuffle() was as fast or a little faster than LU's. (I find this even more interesting given that LU is using the core rand functionality which is 5x faster than MRMA's irand function.)

    However, if you want real speed, you should take advantage of MRMA's capability to shuffle an array in situ (something that LU's shuffle can't do). You do this by feeding it an array ref, and it shuffles the contents of the array itself (and not a copy of it).

    Even for the smallest of lists, MRMA is faster. For large lists, the comparison is overwhelming.

    Remember: There's always one more bug.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://533396]
Approved by friedo
Front-paged by wfsp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2014-08-29 03:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (275 votes), past polls