Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Fisher-Yates Shuffle

by Adam (Vicar)
on Aug 30, 2000 at 04:20 UTC ( #30243=perlmeditation: print w/replies, xml ) Need Help??

I was looking at the Fisher-Yates Shuffle and I noticed that there is a branch in the while loop. This got me thinking, "Is it a big deal to swap two locations in an array when the locations are the same?"

Well it occured to me that there would be some overhead... but how much? Is that overhead more then or less then the overhead of a branch? Well, honestly I don't know. But I ran a benchmark, and it came out close... Here is the code, followed by the results:

use strict; use Benchmark; use vars '@array'; # The Fisher-Yates Shuffle # # my $arrayref = shift; # my $index = @$arrayref; # while ( $index-- ) # { # my $newloc = int rand (1+$i); # next if $index == $newloc; # @$arrayref[$index, $newloc] = @$arrayref[$newloc, $index]; # } @array = (1..100_000); # An array to be shuffled. # Rather then pass in a ref to the array, just shuffle the array in pl +ace. timethese( 100, { 'BRANCH' => sub { my $i=@array; while($i--){ my $j=int rand(1+$i); next if $i==$j; ### This is the line being tested. @array[$i, $j]=@array[$j, $i] }}, 'WITHOUT' => sub { my $i=@array; while($i--){ my $j=int rand(1+$i); @array[$i, $j]=@array[$j, $i] }}, } ); # Benchmark: timing 100 iterations of BRANCH, WITHOUT... # BRANCH: 73 wallclock secs (73.43 usr + 0.00 sys = 73.43 CPU) @ 1. +36/s (n=100) # WITHOUT: 67 wallclock secs (67.02 usr + 0.00 sys = 67.02 CPU) @ 1. +49/s (n=100) # Benchmark: timing 1000 iterations of BRANCH, WITHOUT... # BRANCH: 727 wallclock secs (726.67 usr + 0.00 sys = 726.67 CPU) @ + 1.38/s (n=1000) # WITHOUT: 668 wallclock secs (667.78 usr + 0.02 sys = 667.80 CPU) @ + 1.50/s (n=1000)
It looks to me like the branch is not needed.

Replies are listed 'Best First'.
RE (tilly) 1: Fisher-Yates Shuffle
by tilly (Archbishop) on Aug 30, 2000 at 04:57 UTC
    Cool!

    But note: The larger the array, the more the branch will hurt you. On average the number of times you will get to break out before the swap when shuffling n elements is log(n)+O(1).

    So while it is a loss for an array with 100,000 elements, that is a worst case for having the branch.

      Right. But big-O notation is concerned with worst case.

      But I think your log(n) time is wrong. Consider the fact that each pass through the loop is dealing with a smaller array. (Speaking of which, I ought to have used a pre-decrement instead of a post decrement... the last pass is a waste of time.) So the value of the branch decreases as the size of the array increases, right? Think about it this way: The odds of branching on the first pass through an array of size N are 1/N. The next passs through, the relative array size is N-1. So for any given array of size N the odds of branching at all are between 1 and 1/N * 1/(N-1) * 1/(N-2) and so on. That makes the high end, where every single loop branched, equal to 1/(N!). The worst case scenario of branching often gets pretty rare pretty quick. That branch gets pretty useless pretty fast, no?

      So what's the penalty of having the branch? Well the compiler can do one of two things with the assembly code. One, it guesses that the branch is common - and arranges the assembly to jump to a different memory block when the two indexes arn't equal. So you pay the price of doing that test and jumping pretty often... not good. OR The compiler guesses that the branch is rare... and has to test for equivalence on every pass and you paying the price. (This is more likely what Perl did, considering how close the benchmarks were.) Note that the test itself is simple. Its just a pass through an AND gate and maybe a NOT gate. (There is a third, even uglier, possability... that Perl actually uses a cache guess strategy where it guesses that if it branched last time then it will branch again... and vice-versa. This is a good strategy when the odds are closer to 50-50, but not in this case.)

      But if you don't introduce the branch at all, you pay a different penalty... which is why I posted this Meditation in the first place: Now you have removed the test for equivalence on every pass, and removed two lines of assembly code, but you are also trying to swap an array element with itself. If there were no penalty for that then the branch would never make sense, but I doubt this very much. So the question is... how severe is the penalty? Is it worse then checking if two 32 bit ints are equal? Probably. But not much. And since its more likely that you'll have to do the swap anyway... you might as well, right?

        You make one pass through the array. At each step you will make one swap. The idea is that at the first step you choose what element belongs at the end of the array. At the second step you choose the next to last element of the array out of what remained. And so on until you get to choosing the first element out of..gee..one element. :-)

        Now how many times will you swap an element for itself? On average the last element gets swapped 1/n times, the next one 1/(n-1), etc down to 1/1. That is:

        1 + 1/2 + 1/3 + ... + 1/n
        
        Which is, after suitable rearranging:
        1/2 + 1/2n +
             (1 + 1/2)/2 +
             (1/2 + 1/3)/2 +
             (1/3 + 1/4)/2 +
             .  .  .       +
             (1/(n-1) + 1/n)/2
        
        Aside from the pieces at the ends, this turns out to be a sum of local trapezoidal approximations to the integral from 1 to n of 1/x. The error terms turn out to converge to a constant, and the integral that you are approximating is log(n)-log(1)=log(n). (Natural logs, of course, are the only kind that most mathematicians care about.)

        Therefore on average the number of swaps you save is log(n) plus a complicated constant plus some small terms.

        Now this is on average. What will happen realistically? Well instead of adding up numbers we are adding up random variables. The calculation this time makes the previous one I sketched seem trivial. (Note, the variables are not independent.) But glory halleluja! The variance of the sum for any number of terms turns out to be bounded by a constant (and not a big one either) so in fact you can put absolute upper bound on the likelyhood it varies by more than, say, 10 from that average. So you can say that within (eg) a 99% confidence interval it will lie within some fixed distance of log(n), and that estimate will be true for n=10, n=1000, n=1,000,000, and n=10^100.

        Now how does this fit in big-O notation? Well first of all big-O notation as defined by Knuth doesn't really fit for random algorithms. Darn. (Easy enough to modify though.) Secondly people don't really use the word the same way that Knuth did. (eg Hash lookups as implemented in Perl are O(n). Everyone calls them O(1). Conversely O(n) algorithms are really O(n*n) as well, but we say, "That is O(n*n) so it is not a good algorithm.") Double darn. But hey, that is OK. After all big-O, little-o were invented by Bachmann in 1894 for number theory, not algorithms. (And were later popularized by Landau.) Language does change.

        Besides which, your test (averaging repeated runs) is going to test average behaviour, and not the outliers.

        Therefore whether or not you argue about the use of the phrase, "O(1)", my underlying comment is an accurate statement about what you are trying to measure. The benefit of the branch will be seen, on average, log(n) times plus some constant on an array of size n. Putting the logic in will cost you n times.

        An incidental note. Perl compiles everything down to portable op codes, not assembler, and runs those codes through an interpreter. Therefore thinking about how a C compiler would solve a problem is missing the point. perl simply does not deal with the physical machine at the same level that, say, gcc does. That if check is going to be dealt with as an interpreted statement.

        BTW take a look at my home node. I may have ranted a bit here, but that is because you wandered too close to what I am really good at. Namely math. :-)

        EDIT
        Good and out of practice. :-(

        When I put it on paper the random variables actually were independent with variances of (n-1)/(n*n), so the variance of the sum is again log(n)+O(1), and the standard deviation O(sqrt(log(n))). Therefore on a run you save:

        log(n) + O(sqrt(log(n)))
        
        iterations...
Re: Fisher-Yates Shuffle
by Random_Walk (Prior) on Jun 02, 2014 at 06:20 UTC

    I may have found another optimisation here, but, lest I broke the algorithm, I would love the input of monks wiser than I. Why decrement $i in the while, only to have to add 1 to it immediately afterwards? See IFIDDLE in the following update to the OP's benchmark

    use strict; use warnings; use Benchmark qw(cmpthese); my @array = (1..10_000); # An array to be shuffled. $_ = int rand (52) for @array; # Rather then pass in a ref to the array, just shuffle the array in pl +ace. cmpthese( 10000, { 'BRANCH' => sub { my $i=@array; while($i--){ my $j=int rand(1+$i); next if $i==$j; ### This is the line being tested. @array[$i, $j]=@array[$j, $i] }}, 'WITHOUT' => sub { my $i=@array; while($i--){ my $j=int rand(1+$i); @array[$i, $j]=@array[$j, $i] }}, 'IFIDDLE' => sub { my $i=@array; while($i){ my $j=int rand($i--); @array[$i, $j]=@array[$j, $i] }}, 'MY_J_OUT' => sub { my $i=@array; my $j; while($i){ $j=int rand($i--); @array[$i, $j]=@array[$j, $i] }}, 'FOR_LOOP' => sub { my $j; $j = int rand ($_), @array[$_, $j] = @array[$j, $_] for reverse 1 .. $#array; }, 'BUK' => sub { my( $r, $t ); $r = $_ + rand( @array - $_ ), $t = $array[ $_ ], $array[ $_ ] = $array[ $r ], $array[ $r ] = $t for 0 .. $#array; }, } ); # Benchmark: timing 10000 iterations of BRANCH, BUK, FOR_LOOP, IFIDDLE +, MY_J_OUT, WITHOUT... # BRANCH: 64 wallclock secs (62.72 usr + 0.00 sys = 62.72 CPU) @ 1 +59.44/s (n=10000) # BUK: 62 wallclock secs (59.58 usr + 0.00 sys = 59.58 CPU) @ 1 +67.85/s (n=10000) # FOR_LOOP: 55 wallclock secs (53.22 usr + 0.00 sys = 53.22 CPU) @ 1 +87.91/s (n=10000) # IFIDDLE: 54 wallclock secs (52.61 usr + 0.00 sys = 52.61 CPU) @ 1 +90.08/s (n=10000) # MY_J_OUT: 50 wallclock secs (46.81 usr + 0.00 sys = 46.81 CPU) @ 2 +13.62/s (n=10000) # WITHOUT: 55 wallclock secs (53.11 usr + 0.00 sys = 53.11 CPU) @ 1 +88.29/s (n=10000) # Rate BRANCH BUK FOR_LOOP WITHOUT IFIDDLE MY_J_OUT # BRANCH 158/s -- -4% -16% -16% -17% -27% # BUK 164/s 4% -- -12% -13% -13% -24% # FOR_LOOP 187/s 19% 14% -- -0% -1% -14% # WITHOUT 188/s 19% 14% 0% -- -1% -14% # IFIDDLE 190/s 20% 15% 1% 1% -- -13% # MY_J_OUT 218/s 38% 32% 16% 16% 15% --

    Update

    added BrowserUK, my own optimisation taking the 'my' out the loop, and another version using a for loop.

    Cheers,
    R.

    Pereant, qui ante nos nostra dixerunt!

      You might also consider this version:

      sub bukShuffle { my( $r, $t ); $r = $_ + rand( @array - $_ ), $t = $array[ $_ ], $array[ $_ ] = $array[ $r ], $array[ $r ] = $t for 0 .. $#array; }

      And I offer this method (drawing heavily upon a post by Abigail-II) for checking your modifications produce fair results:

      my %h; bukShuffle( @array = qw[a b c d ] ), ++$h{ join ' ', @array } for 1 .. + 1e6; pp %h;

      Which for the code above produces:

      [19:07:24.88] C:\test>junk50 ( "c a d b", 41868, "a d b c", 41833, "b d a c", 41554, "d a c b", 41772, "b c d a", 41671, "c d a b", 41779, "b d c a", 41783, "a b d c", 41724, "c b a d", 42012, "c a b d", 41571, "b c a d", 41270, "b a c d", 41192, "d c a b", 41443, "a b c d", 41669, "a c d b", 41884, "d b a c", 41786, "c b d a", 41608, "a d c b", 41598, "d c b a", 41544, "b a d c", 41578, "d b c a", 41684, "d a b c", 41672, "a c b d", 41622, "c d b a", 41883, )

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        I added your version to my original benchmark. I saw you took the 'my' out the loop, so I also did the same to my version, which gave me a significant speed up (15% faster).

        Cheers,
        R.

        Pereant, qui ante nos nostra dixerunt!

      Random Walk:

      Update: I bungled it. The results were suspiciously fast. When copying a block, I left in one of the --$i statements, so I was skipping every other element. Updated code & results:

      'ROBO' => sub { my $i = @array; do { ++$cRO; my $j = int rand $i; @array[$i,$j] = @array[$j,$i]; } while ($i--); }, roboticus@sparky:~$ perl 1088227.pl Benchmark: timing 500 iterations of BRANCH, IFIDDLE, ROBO, WITHOUT... BRANCH: 17 wallclock secs (17.05 usr + 0.00 sys = 17.05 CPU) @ 29 +.33/s (n=500) IFIDDLE: 14 wallclock secs (14.29 usr + 0.00 sys = 14.29 CPU) @ 34 +.99/s (n=500) ROBO: 19 wallclock secs (18.59 usr + 0.00 sys = 18.59 CPU) @ 26 +.90/s (n=500) WITHOUT: 15 wallclock secs (15.69 usr + 0.00 sys = 15.69 CPU) @ 31 +.87/s (n=500) 5000000, 5250000, 5000000, 5125250

      My code is still executing the wrong number of loops (the last number on the last line), but not so far out of line. Also, WITHOUT (second) also has an unexpected number of loops. Original node below:


      You could also move the comparison to the end of the loop to avoid that +1:

      'ROBO' => sub { my $i = @array; do { my $j = int rand $i--; @array[$i,$j] = @array[$j,$i]; } while (--$i); },

      Doing so gave me:

      roboticus@sparky:~$ perl 1088227.pl Benchmark: timing 500 iterations of BRANCH, IFIDDLE, ROBO, WITHOUT... BRANCH: 17 wallclock secs (16.77 usr + 0.00 sys = 16.77 CPU) @ 29 +.82/s (n=500) IFIDDLE: 14 wallclock secs (14.02 usr + 0.00 sys = 14.02 CPU) @ 35 +.66/s (n=500) ROBO: 8 wallclock secs ( 7.85 usr + 0.00 sys = 7.85 CPU) @ 63 +.69/s (n=500) WITHOUT: 15 wallclock secs (15.33 usr + 0.00 sys = 15.33 CPU) @ 32 +.62/s (n=500) roboticus@sparky:~$

      It seems to help (unless I bungled it).

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

        That doesn't look right, now you're decrementing $i twice in each ROBO loop. Isn't that only doing half the work , hence taking half the time?

RE: Fisher-Yates Shuffle
by Anonymous Monk on Sep 08, 2000 at 01:57 UTC
    There was a thread on comp.lang.perl.misc about this awhile back.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://30243]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2019-06-17 23:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Is there a future for codeless software?



    Results (80 votes). Check out past polls.

    Notices?
    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!