comment on

In this meditation, I'm benchmarking a couple of subroutines to uniformly shuffle an array.

I know others have done this too, for exapmle see Re^2: Randomising the order of an array. However, I've learnt much from doing this myself.

First, let's see what shuffling functions I was using. (I originally posted these at Re^2: Array Shuffle. I've removed the bless method, as it's very slow (probably because it has to numerify two strings in every comparision).)

I used six different algorithms. One of these just calls the shuffle function from List::Util, which is usually implemented as an XS functions, so it can be expected that this would be the fastest.

Two of the algorithms assign a random number to each element in the array, and sort the array using that number as a key. One of these uses a schwartzian transform, the other one uses a single array to store the keys.

Note that these sorting algorithms only give a truly uniform random permutation if the rand() function of perl gives enough random bits. This is because of the birthday-paradox, see chapter 5.3 of the Cormen – Leiserson – Rivest – Stein book for more details. I was using linux, so perl uses the drand48 function from glibc, which gives 48 good random bits, so this doesn't affect me except for large arrays. However, on at least some windows systems, the rand perl function probably calls the C rand function only once, and that gives only 15 random bits which is too few for even small arrays.

The other three algorithms used the classical swapping algorithm: take each element in the array starting from the first, and swap it with one of the elements after it (including itself). I've made three variants of this. One is the classical swap, one uses map to collect the swapped-back data, and one uses shift to remove elements from the front of the array instead. (Using shift was not my idea, someone on the cb recommended it earlier, but I can't remember who he was, sorry.)

On this algorithm, see both the above mentioned chapter in the Cormen book, or chapter 3.4.2 of Knuth's TAOCP.

Here's the code I was using.

use warnings;
use strict;

use Benchmark ":all";
use IO::Handle;

my %shuffle;

use List::Util;
$shuffle{"xs"} = \&List::Util::shuffle;

$shuffle{"swap"} = sub {
        my @p = @_;
        my $k;
        $k = $_ + int(rand(@p - $_)),      
                @p[$_, $k] = @p[$k, $_] for
                0 .. @p - 1;
        @p;
};

$shuffle{"map"} = sub {
        my @p = @_;
        map { splice @p, $_ + rand(@p - $_), 1, $p[$_] } 0 .. @p - 1;
};

$shuffle{"shift"} = sub {
        my @p = @_;
        my $t;
        map { $t = splice @p, rand(@p), 1, $p[0]; shift @p; $t } 0 .. 
+@p - 1;
};

$shuffle{"weight"} = sub {
        my @w = map { rand } @_; 
        @_[sort { $w[$a] <=> $w[$b] } 0 .. @_ - 1];
};

$shuffle{"schwartzian"} = sub {
        map { $$_[1] } sort { $$a[0] <=> $$b[0] } map { [rand, $_] } @
+_;
};

if (0) { $shuffle{"bless"} = sub {
        map { $$_[0] } sort { ref $a <=> ref $b } map { bless [$_], ra
+nd } @_;      
}; }

if (0) {
    my $N = 0 + ($ARGV[0] || 20);
    my $T = 0 + ($ARGV[1] || 1000);
    for my $n (keys(%shuffle)) {
        my %f;
        print $n;
        my $a = $shuffle{$n};
        for (1 .. $T) {
        my @a = 1 .. $N;
        $f{join(" ", &$a(@a))}++;
        }
        for (sort keys(%f)) {
        printf "%5d: %s\n", $f{$_}, $_;
        }
        print 0+keys(%f), "\n\n";
    }
}

my @sd;
my %shtest = map { my($n, $a) = ($_, $shuffle{$_}); $n, sub {
        my @r = &$a(@sd);
    } } keys(%shuffle);
for my $N (map { 1 + 2 * 3**$_ } 0 .. 8) {
    print "testing on $N elements\n";
    @sd = my @sd0 = map { rand } 1 .. $N;
    cmpthese(-5, \%shtest);
    join(";", @sd) eq join(";", @sd0) or
        die "error: some of the shuffle functions have changed the arr
+ay";
    flush STDOUT;
}
print "done\n";

__END__
[download]

And here's the output, using v5.8.8 on i686-linux.

testing on 3 elements
                 Rate schwartzian      swap   weight    shift       ma
+p       xs
schwartzian  131923/s          --       -1%     -15%     -20%      -23
+%     -88%
swap         133815/s          1%        --     -13%     -19%      -22
+%     -88%
weight       154592/s         17%       16%       --      -6%      -10
+%     -86%
shift        165196/s         25%       23%       7%       --       -4
+%     -86%
map          172130/s         30%       29%      11%       4%        -
+-     -85%
xs          1140464/s        764%      752%     638%     590%      563
+%       --
testing on 7 elements
                Rate schwartzian   weight      swap    shift       map
+        xs
schwartzian  61200/s          --     -15%      -20%     -27%      -29%
+      -90%
weight       72048/s         18%       --       -5%     -13%      -17%
+      -88%
swap         76058/s         24%       6%        --      -9%      -12%
+      -87%
shift        83283/s         36%      16%       10%       --       -4%
+      -86%
map          86718/s         42%      20%       14%       4%        --
+      -85%
xs          592396/s        868%     722%      679%     611%      583%
+        --
testing on 19 elements
                Rate schwartzian   weight      swap    shift       map
+        xs
schwartzian  21339/s          --     -15%      -32%     -33%      -43%
+      -91%
weight       25214/s         18%       --      -20%     -21%      -32%
+      -90%
swap         31452/s         47%      25%        --      -2%      -16%
+      -87%
shift        31957/s         50%      27%        2%       --      -14%
+      -87%
map          37239/s         75%      48%       18%      17%        --
+      -85%
xs          243980/s       1043%     868%      676%     663%      555%
+        --
testing on 55 elements
               Rate schwartzian   weight      swap     shift       map
+        xs
schwartzian  6790/s          --     -17%      -41%      -42%      -49%
+      -92%
weight       8154/s         20%       --      -29%      -30%      -39%
+      -90%
swap        11492/s         69%      41%        --       -1%      -14%
+      -86%
shift       11650/s         72%      43%        1%        --      -13%
+      -86%
map         13400/s         97%      64%       17%       15%        --
+      -84%
xs          83719/s       1133%     927%      628%      619%      525%
+        --
testing on 163 elements
               Rate schwartzian   weight     shift      swap       map
+        xs
schwartzian  1902/s          --     -19%      -52%      -52%      -58%
+      -93%
weight       2356/s         24%       --      -41%      -41%      -48%
+      -92%
shift        3982/s        109%      69%        --       -0%      -13%
+      -86%
swap         3998/s        110%      70%        0%        --      -12%
+      -86%
map          4559/s        140%      93%       14%       14%        --
+      -84%
xs          28869/s       1418%    1125%      625%      622%      533%
+        --
testing on 487 elements
              Rate schwartzian    weight     shift      swap       map
+        xs
schwartzian  515/s          --      -20%      -60%      -61%      -65%
+      -95%
weight       644/s         25%        --      -50%      -51%      -56%
+      -93%
shift       1292/s        151%      101%        --       -2%      -12%
+      -86%
swap        1321/s        157%      105%        2%        --      -10%
+      -86%
map         1467/s        185%      128%       14%       11%        --
+      -84%
xs          9413/s       1728%     1363%      629%      612%      542%
+        --
testing on 1459 elements
              Rate schwartzian    weight     shift      swap       map
+        xs
schwartzian  121/s          --      -28%      -69%      -70%      -71%
+      -95%
weight       168/s         39%        --      -57%      -58%      -59%
+      -94%
shift        396/s        226%      135%        --       -1%       -4%
+      -85%
swap         401/s        230%      138%        1%        --       -3%
+      -85%
map          414/s        241%      146%        4%        3%        --
+      -84%
xs          2656/s       2088%     1477%      570%      563%      542%
+        --
testing on 4375 elements
              Rate schwartzian    weight       map     shift      swap
+        xs
schwartzian 20.8/s          --      -33%      -71%      -75%      -76%
+      -94%
weight      31.1/s         50%        --      -57%      -63%      -64%
+      -92%
map         72.6/s        249%      133%        --      -14%      -16%
+      -81%
shift       84.7/s        307%      172%       17%        --       -2%
+      -77%
swap        86.6/s        317%      178%       19%        2%        --
+      -77%
xs           376/s       1709%     1108%      418%      344%      334%
+        --
testing on 13123 elements
              Rate schwartzian    weight       map     shift      swap
+        xs
schwartzian 4.46/s          --      -28%      -72%      -73%      -76%
+      -95%
weight      6.18/s         39%        --      -61%      -63%      -67%
+      -92%
map         15.7/s        252%      154%        --       -6%      -16%
+      -81%
shift       16.8/s        277%      172%        7%        --      -10%
+      -80%
swap        18.7/s        319%      203%       19%       11%        --
+      -77%
xs          82.0/s       1739%     1227%      422%      388%      339%
+        --
done
[download]

Ok, so what can we learn from all this output?

As we have expected, the XS function was much faster than any other method.

The next thing we can see is that the sorting solutions (schwartzian and weight) were slower than any of the combinatorical ones, and that this disadvantage is even more significant with larger arrays. This is sort of logical, after all, sorting takes O(n·log n) time but the other algorithms take only O(n).

We can also see that the schwartzian method is slower than the one using only one array. This was surprising to me when I first met it, but then tye and Aristotle explained to me in the replies of Re: Benchmark, -s versus schwartzian why this has to be like this. (Here, I don't implement the hash lookup approach shown in that thread, only the array lookup one.)

I also wonder if it's possible to speed up this method by using the variant of the schwartzian transform where you map the data to strings that you can sort with one of the built-in sorting functions, see Re^3: Benchmark, -s versus schwartzian (vs hash) or �Re^2: Benchmark, -s versus schwartzian or fast, flexible, stable sort. (Apparently that's called GRT. Can anyone tell what that stands for? (Update: bart says it's the Guttman-Rosler transform, see also this slide.)) Maybe I'll try that later.

Now let's examine the faster, combinatorical solutions (swap, map, shift).

All three solutions have about the same speed. The (fortran-like) swapping solution is the fastest for larger arrays (from say 4K elements), but the other two are faster for small arrays. This is probably because it doesn't have to do memory managment. Finally, the map solution is consistently faster than the shift solution, but only by a small amount.

In reply to Benchmarking perfect shuffles by ambrus

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


laziness, impatience, and hubris
	PerlMonks