I was looking at the Fisher-Yates Shuffle and I noticed that there is a branch in the while loop. This got me thinking, "Is it a big deal to swap two locations in an array when the locations are the same?" Well it occured to me that there would be some overhead... but how much? Is that overhead more then or less then the overhead of a branch? Well, honestly I don't know. But I ran a benchmark, and it came out close... Here is the code, followed by the results:
use strict;
use Benchmark;
use vars '@array';
# The Fisher-Yates Shuffle
#
# my $arrayref = shift;
# my $index = @$arrayref;
# while ( $index-- )
# {
# my $newloc = int rand (1+$i);
# next if $index == $newloc;
# @$arrayref[$index, $newloc] = @$arrayref[$newloc, $index];
# }
@array = (1..100_000); # An array to be shuffled.
# Rather then pass in a ref to the array, just shuffle the array in pl
+ace.
timethese( 100, {
'BRANCH' => sub {
my $i=@array;
while($i--){
my $j=int rand(1+$i);
next if $i==$j; ### This is the line being tested.
@array[$i, $j]=@array[$j, $i]
}},
'WITHOUT' => sub {
my $i=@array;
while($i--){
my $j=int rand(1+$i);
@array[$i, $j]=@array[$j, $i]
}},
} );
# Benchmark: timing 100 iterations of BRANCH, WITHOUT...
# BRANCH: 73 wallclock secs (73.43 usr + 0.00 sys = 73.43 CPU) @ 1.
+36/s (n=100)
# WITHOUT: 67 wallclock secs (67.02 usr + 0.00 sys = 67.02 CPU) @ 1.
+49/s (n=100)
# Benchmark: timing 1000 iterations of BRANCH, WITHOUT...
# BRANCH: 727 wallclock secs (726.67 usr + 0.00 sys = 726.67 CPU) @
+ 1.38/s (n=1000)
# WITHOUT: 668 wallclock secs (667.78 usr + 0.02 sys = 667.80 CPU) @
+ 1.50/s (n=1000)
It looks to me like the branch is not needed.
RE (tilly) 1: Fisher-Yates Shuffle
by tilly (Archbishop) on Aug 30, 2000 at 04:57 UTC
|
Cool!
But note: The larger the array, the more the branch will
hurt you. On average the number of times you will get to
break out before the swap when shuffling n elements is log(n)+O(1).
So while it is a loss for an array with 100,000 elements,
that is a worst case for having the branch. | [reply] |
|
|
Right. But big-O notation is concerned with worst case.
But I think your log(n) time is wrong. Consider the fact that each pass through the loop is dealing with a smaller array. (Speaking of which, I ought to have used a pre-decrement instead of a post decrement... the last pass is a waste of time.) So the value of the branch decreases as the size of the array increases, right? Think about it this way: The odds of branching on the first pass through an array of size N are 1/N. The next passs through, the relative array size is N-1.
So for any given array of size N the odds of branching at all are between 1 and 1/N * 1/(N-1) * 1/(N-2) and so on. That makes the high end, where every single loop branched, equal to 1/(N!). The worst case scenario of branching often gets pretty rare pretty quick. That branch gets pretty useless pretty fast, no?
So what's the penalty of having the branch? Well the compiler can do one of two things with the assembly code. One, it guesses that the branch is common - and arranges the assembly to jump to a different memory block when the two indexes arn't equal. So you pay the price of doing that test and jumping pretty often... not good. OR The compiler guesses that the branch is rare... and has to test for equivalence on every pass and you paying the price. (This is more likely what Perl did, considering how close the benchmarks were.) Note that the test itself is simple. Its just a pass through an AND gate and maybe a NOT gate.
(There is a third, even uglier, possability... that Perl actually uses a cache guess strategy where it guesses that if it branched last time then it will branch again... and vice-versa. This is a good strategy when the odds are closer to 50-50, but not in this case.)
But if you don't introduce the branch at all, you pay a different penalty... which is why I posted this Meditation in the first place: Now you have removed the test for equivalence on every pass, and removed two lines of assembly code, but you are also trying to swap an array element with itself. If there were no penalty for that then the branch would never make sense, but I doubt this very much. So the question is... how severe is the penalty? Is it worse then checking if two 32 bit ints are equal? Probably. But not much. And since its more likely that you'll have to do the swap anyway... you might as well, right?
| [reply] |
|
|
1 + 1/2 + 1/3 + ... + 1/n
Which is, after suitable rearranging:
1/2 + 1/2n +
(1 + 1/2)/2 +
(1/2 + 1/3)/2 +
(1/3 + 1/4)/2 +
. . . +
(1/(n-1) + 1/n)/2
Aside from the pieces at the ends, this turns out to be a
sum of local trapezoidal approximations to the integral
from 1 to n of 1/x. The error terms turn out to converge
to a constant, and the integral that you are approximating
is log(n)-log(1)=log(n). (Natural logs, of
course, are the only kind that most mathematicians care
about.)
Therefore on average the number of swaps you save is log(n)
plus a complicated constant plus some small terms.
Now this is on average. What will happen realistically?
Well instead of adding up numbers we are adding up random
variables. The calculation this time makes the previous
one I sketched seem trivial. (Note, the variables are not
independent.) But glory halleluja! The variance of the
sum for any number of terms turns out to be bounded by a
constant (and not a big one either) so in fact you can put
absolute upper bound on the likelyhood it varies by more
than, say, 10 from that average. So you can say that within
(eg) a 99% confidence interval it will lie within some
fixed distance of log(n), and that estimate will be true
for n=10, n=1000, n=1,000,000, and n=10^100.
Now how does this fit in big-O notation? Well first of all
big-O notation as defined by Knuth doesn't really fit for
random algorithms. Darn. (Easy enough to modify though.)
Secondly people don't really use the word the same way that
Knuth did. (eg Hash lookups as implemented in Perl are
O(n). Everyone calls them O(1). Conversely O(n) algorithms
are really O(n*n) as well, but we say, "That is O(n*n) so it
is not a good algorithm.") Double darn. But hey, that is
OK. After all big-O, little-o were invented by Bachmann
in 1894 for number theory, not algorithms. (And were later
popularized by Landau.) Language does change.
Besides which, your test (averaging repeated runs) is going
to test average behaviour, and not the outliers.
Therefore whether or not you argue about the use of the
phrase, "O(1)", my underlying comment is an accurate
statement about what you are trying to measure. The benefit
of the branch will be seen, on average, log(n) times plus
some constant on an array of size n. Putting the logic in
will cost you n times.
An incidental note. Perl compiles everything down to
portable op codes, not assembler, and runs those codes
through an interpreter. Therefore thinking
about how a C compiler would solve a problem is missing
the point. perl simply does not deal with the physical
machine at the same level that, say, gcc does. That if
check is going to be dealt with as an interpreted
statement.
BTW take a look at my home node. I may have ranted a bit
here, but that is because you wandered too close to what I
am really good at. Namely math. :-)
EDIT
Good and out of practice. :-(
When I put it on paper the random variables actually were
independent with variances of (n-1)/(n*n), so the variance
of the sum is again log(n)+O(1), and the standard deviation
O(sqrt(log(n))). Therefore on a run you save:
log(n) + O(sqrt(log(n)))
iterations... | [reply] |
|
|
|
|
|
|
|
|
|
|
|
Re: Fisher-Yates Shuffle
by Random_Walk (Prior) on Jun 02, 2014 at 06:20 UTC
|
I may have found another optimisation here, but, lest I broke the algorithm, I would love the input of monks wiser than I. Why decrement $i in the while, only to have to add 1 to it immediately afterwards? See IFIDDLE in the following update to the OP's benchmark
use strict;
use warnings;
use Benchmark qw(cmpthese);
my @array = (1..10_000); # An array to be shuffled.
$_ = int rand (52) for @array;
# Rather then pass in a ref to the array, just shuffle the array in pl
+ace.
cmpthese( 10000, {
'BRANCH' => sub {
my $i=@array;
while($i--){
my $j=int rand(1+$i);
next if $i==$j; ### This is the line being tested.
@array[$i, $j]=@array[$j, $i]
}},
'WITHOUT' => sub {
my $i=@array;
while($i--){
my $j=int rand(1+$i);
@array[$i, $j]=@array[$j, $i]
}},
'IFIDDLE' => sub {
my $i=@array;
while($i){
my $j=int rand($i--);
@array[$i, $j]=@array[$j, $i]
}},
'MY_J_OUT' => sub {
my $i=@array;
my $j;
while($i){
$j=int rand($i--);
@array[$i, $j]=@array[$j, $i]
}},
'FOR_LOOP' => sub {
my $j;
$j = int rand ($_),
@array[$_, $j] = @array[$j, $_]
for reverse 1 .. $#array;
},
'BUK' => sub {
my( $r, $t );
$r = $_ + rand( @array - $_ ),
$t = $array[ $_ ],
$array[ $_ ] = $array[ $r ],
$array[ $r ] = $t
for 0 .. $#array;
},
} );
# Benchmark: timing 10000 iterations of BRANCH, BUK, FOR_LOOP, IFIDDLE
+, MY_J_OUT, WITHOUT...
# BRANCH: 64 wallclock secs (62.72 usr + 0.00 sys = 62.72 CPU) @ 1
+59.44/s (n=10000)
# BUK: 62 wallclock secs (59.58 usr + 0.00 sys = 59.58 CPU) @ 1
+67.85/s (n=10000)
# FOR_LOOP: 55 wallclock secs (53.22 usr + 0.00 sys = 53.22 CPU) @ 1
+87.91/s (n=10000)
# IFIDDLE: 54 wallclock secs (52.61 usr + 0.00 sys = 52.61 CPU) @ 1
+90.08/s (n=10000)
# MY_J_OUT: 50 wallclock secs (46.81 usr + 0.00 sys = 46.81 CPU) @ 2
+13.62/s (n=10000)
# WITHOUT: 55 wallclock secs (53.11 usr + 0.00 sys = 53.11 CPU) @ 1
+88.29/s (n=10000)
# Rate BRANCH BUK FOR_LOOP WITHOUT IFIDDLE MY_J_OUT
# BRANCH 158/s -- -4% -16% -16% -17% -27%
# BUK 164/s 4% -- -12% -13% -13% -24%
# FOR_LOOP 187/s 19% 14% -- -0% -1% -14%
# WITHOUT 188/s 19% 14% 0% -- -1% -14%
# IFIDDLE 190/s 20% 15% 1% 1% -- -13%
# MY_J_OUT 218/s 38% 32% 16% 16% 15% --
Update
added BrowserUK, my own optimisation taking the 'my' out the loop, and another version using a for loop.
Cheers, R.
Pereant, qui ante nos nostra dixerunt!
| [reply] [d/l] |
|
|
sub bukShuffle {
my( $r, $t );
$r = $_ + rand( @array - $_ ),
$t = $array[ $_ ],
$array[ $_ ] = $array[ $r ],
$array[ $r ] = $t
for 0 .. $#array;
}
And I offer this method (drawing heavily upon a post by Abigail-II) for checking your modifications produce fair results: my %h;
bukShuffle( @array = qw[a b c d ] ), ++$h{ join ' ', @array } for 1 ..
+ 1e6;
pp %h;
Which for the code above produces: [19:07:24.88] C:\test>junk50
(
"c a d b", 41868,
"a d b c", 41833,
"b d a c", 41554,
"d a c b", 41772,
"b c d a", 41671,
"c d a b", 41779,
"b d c a", 41783,
"a b d c", 41724,
"c b a d", 42012,
"c a b d", 41571,
"b c a d", 41270,
"b a c d", 41192,
"d c a b", 41443,
"a b c d", 41669,
"a c d b", 41884,
"d b a c", 41786,
"c b d a", 41608,
"a d c b", 41598,
"d c b a", 41544,
"b a d c", 41578,
"d b c a", 41684,
"d a b c", 41672,
"a c b d", 41622,
"c d b a", 41883,
)
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|
'ROBO' => sub {
my $i = @array;
do {
++$cRO;
my $j = int rand $i;
@array[$i,$j] = @array[$j,$i];
} while ($i--);
},
roboticus@sparky:~$ perl 1088227.pl
Benchmark: timing 500 iterations of BRANCH, IFIDDLE, ROBO, WITHOUT...
BRANCH: 17 wallclock secs (17.05 usr + 0.00 sys = 17.05 CPU) @ 29
+.33/s (n=500)
IFIDDLE: 14 wallclock secs (14.29 usr + 0.00 sys = 14.29 CPU) @ 34
+.99/s (n=500)
ROBO: 19 wallclock secs (18.59 usr + 0.00 sys = 18.59 CPU) @ 26
+.90/s (n=500)
WITHOUT: 15 wallclock secs (15.69 usr + 0.00 sys = 15.69 CPU) @ 31
+.87/s (n=500)
5000000, 5250000, 5000000, 5125250
My code is still executing the wrong number of loops (the last number on the last line), but not so far out of line. Also, WITHOUT (second) also has an unexpected number of loops. Original node below:
You could also move the comparison to the end of the loop to avoid that +1:
'ROBO' => sub {
my $i = @array;
do {
my $j = int rand $i--;
@array[$i,$j] = @array[$j,$i];
} while (--$i);
},
Doing so gave me:
roboticus@sparky:~$ perl 1088227.pl
Benchmark: timing 500 iterations of BRANCH, IFIDDLE, ROBO, WITHOUT...
BRANCH: 17 wallclock secs (16.77 usr + 0.00 sys = 16.77 CPU) @ 29
+.82/s (n=500)
IFIDDLE: 14 wallclock secs (14.02 usr + 0.00 sys = 14.02 CPU) @ 35
+.66/s (n=500)
ROBO: 8 wallclock secs ( 7.85 usr + 0.00 sys = 7.85 CPU) @ 63
+.69/s (n=500)
WITHOUT: 15 wallclock secs (15.33 usr + 0.00 sys = 15.33 CPU) @ 32
+.62/s (n=500)
roboticus@sparky:~$
It seems to help (unless I bungled it).
...roboticus
When your only tool is a hammer, all problems look like your thumb. | [reply] [d/l] [select] |
|
|
| [reply] |
|
|
RE: Fisher-Yates Shuffle
by Anonymous Monk on Sep 08, 2000 at 01:57 UTC
|
There was a thread on comp.lang.perl.misc about
this awhile back. | [reply] |
|
|