### does perl have branch prediction

by david2008 (Scribe)
 on Jan 29, 2014 at 05:01 UTC Need Help??
david2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
I came over an interesting post in stackoverflow
It explains that in Java/C++ processing a sorted array is faster than an unsorted. Let's say you have an array @a, then.
```my \$sum;
foreach my \$x(@a) {
if(\$x>128) {
\$sum +=\$x;
}
}
If @a is sorted, will it be faster than unsorted?
Similary if i have a condition:
```fun_a(\$x) && fun_b(\$x) &&.....&&fun_z(\$x)
and fun_z fails consistently, will fun_z be calculated at the beginning?
Thanks, David

Replies are listed 'Best First'.
Re: does perl have branch prediction
by BrowserUk (Pope) on Jan 29, 2014 at 06:29 UTC

There is a consistent but marginal difference when summing an unsorted array compared to summing the same array once sorted:

```@a = map int( rand 256 ), 1 .. 10e6;;
\$t=time; \$n=0; \$_ > 128 and \$n+=\$_ for @a; print time-\$t, " \$n";;
1.42373704910278  952115465

@a = sort{ \$a <=> \$b } @a;;
\$t=time; \$n=0; \$_ > 128 and \$n+=\$_ for @a; print time-\$t, " \$n";;
1.31046295166016  952115465

However, the saving is not sufficient to make it worth while the very high cost of doing the sort:

```@a = map int( rand 256 ), 1 .. 10e6;;
\$t=time; \$n=0; \$_ > 128 and \$n+=\$_ for @a; print time-\$t, " \$n";;
1.36710000038147  952320992

\$t=time; @a = sort{ \$a <=> \$b } @a; \$n=0; \$_ > 128 and \$n+=\$_ for @a;
+print time-\$t, " \$n";;
5.25063991546631  952320992

Perl doesn't attempt to optimise that complex conditional beyond short-circuiting early.

I'd be very surprised if pre-compiled C++ would reorder the clauses unless it is very obvious at compile-time that fun_Z() will predominantly produce a false return. It certainly won't reorder them at runtime, and unless all the functions in the conditional are very simple and in-lined, it is doubtful whether the branch-prediction at the chip-level will have sufficient scope to do so.

Java might re-order the clauses at runtime via the JIT compiler, if the complex conditional were embedded in a loop.

But, it would do so once during the first iteration of the loop and if during that pass fun_Z() returned its atypical response -- ie. if it returned false when it mostly would return true -- then it could result in an overall pessimisation rather than an optimisation, by seeding the branch prediction the wrong way.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
How can we explain the small time difference between the sorted and unsorted array summing?
How can we explain the small time difference between the sorted and unsorted array summing?

It is almost certainly down to the cpu instruction cache.

1. With the sorted array, on average:

for the first half of the array, the instructions required in the loop will already be in the cache.

Then there will be a need to load the required instructions into the cache, once, when the break point is reached, and they will remain there for the second half of the array.

2. With the unsorted array,

the instruction cache potentially might need to be flushed and repopulated for every other iteration of the loop.

Please note, that is just my best guess. I can't think of any easy -- or even feasible given what hardware and software I have available to me -- way of attempting to verify my hypothesis.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Its all to do with the workflow at a hardware level. I cant remember specifically all of the stages of processing, but suffice it to say that there are multiple stages of processing. When a program comes to a branch (such as an if), it doesnt pause the processing input for however many cycles, just to wait for the result of the if to pop out the end, it instead uses branch prediction to choose a path to go down (really useful if you're running in a loop, not so useful otherwise) and shovels the next instruction into the first stage of processing. When the branch is finally resolved, it determines whether the results from the next instruction are kept or discarded.

If you look at the if you are doing and the dataset, it becomes obvious that a sorted dataset will optimise the branch prediction ability of the CPU, resulting in a minor performance difference (becoming larger as the dataset grows), however as pointed out, sorting the dataset is a lot more costly than the performance gained. (and this is because you are looking at performing many more cycles of processing to sort each item in the array than you will actually save)

Various sort algorithms have different performance based on the order of the incoming data, so you don't need branch prediction to explain it.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Re: does perl have branch prediction
by davido (Archbishop) on Jan 29, 2014 at 07:03 UTC

I find about a 12% savings in summing the ordered array using a foreach loop:

```            Rate unsorted   sorted
unsorted 48478/s       --     -10%
sorted   54097/s      12%       --

This isn't much, really, though time and time again the "sorted" version does win. For the purpose of testing, I'm making some assumptions though; the input data is 50% below the threshold, and 50% above it. However, a sorted list with these characteristics provides additional opportunities for optimization. With a binary search we can (in logarithmic time) find the point in the ordered array to begin summing, thereby eliminating the need to test each element for a value greater 128. By using that technique, and also pushing more of the work into internals, we get a hefty gain:

```              Rate  unsorted    sorted bsearched
unsorted   48233/s        --       -8%      -71%
sorted     52695/s        9%        --      -68%
bsearched 167021/s      246%      217%        --

Here's the benchmark code:

```use Benchmark qw(cmpthese);

use List::Util qw(shuffle sum);
use List::BinarySearch qw( binsearch_pos );

@sorted_data = 0 .. 255;

@unsorted_data = shuffle @sorted_data;

sub make_linear {
my \$aref = shift;
return sub {
my \$sum;
foreach my \$x ( @{\$aref} ) {
if( \$x > 128 ) {
\$sum += \$x;
}
}
return \$sum;
}
}

sub bsearched {
return sum @sorted_data[
( binsearch_pos { \$a <=> \$b } 129, @sorted_data )
.. \$#sorted_data
];
}

print "\$_\n" for
make_linear( \@sorted_data  )->(),
make_linear( \@unsorted_data)->(),
bsearched();

cmpthese -10, {
sorted    => make_linear( \@sorted_data   ),
unsorted  => make_linear( \@unsorted_data ),
bsearched => \&bsearched,
};

Using List::Util::sum along with an array slice moves the linear loop(s) into internals or XS. List::BinarySearch::binsearch_pos, by default, also moves most of the mechanics (except for the comparator callback) into XS. Plus, the binary search finds the first element greater than 128 in just a few iterations. The higher the ratio of disqualified elements to qualified, the better the results will be, since the binary search will quickly disqualify a higher proportion of the initial array.

Dave

Re: does perl have branch prediction
by kcott (Chancellor) on Jan 29, 2014 at 06:43 UTC

G'day david2008,

"I came over an interesting post in stackoverflow It explains that in Java/C++ processing a sorted array is faster than an unsorted."

Firstly, it reports an observation; it doesn't "explain" a fact. Secondly, if you read further, you'll find the OP writes things like "I did investigate a bit further, and things are "funny"." and "... so it really has nothing to do with the data being sorted, ...".

Use Benchmark to compare the running times of Perl code.

What you can do in Perl is this:

```#!/usr/bin/env perl -l

use strict;
use warnings;

my @a = (1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024);

{
print '*** Unsorted ***';
my (\$sum, \$iterations) = (0, 0);

for my \$x (@a) {
if (\$x > 128) {
\$sum += \$x;
}

++\$iterations;
}

print "Sum: \$sum";
print "Iterations: \$iterations";
}

{
print '*** Sorted ***';
my (\$sum, \$iterations) = (0, 0);

for my \$x (sort { \$b <=> \$a } @a) {
last unless \$x > 128;
\$sum += \$x;
++\$iterations;
}

print "Sum: \$sum";
print "Iterations: \$iterations";
}

Output:

```*** Unsorted ***
Sum: 1792
Iterations: 11
*** Sorted ***
Sum: 1792
Iterations: 3

To what degree the processing involved in sorting the data offsets any gains in a reduction of iterations will depend on the size of the array and the value of the elements (for instance, if all values are greater than 128, then sorting would be entirely wasteful and useless).

-- Ken

Re: does perl have branch prediction
by Anonymous Monk on Jan 29, 2014 at 07:59 UTC
Perl does not have branch prediction, C does not have branch prediction, C++ does not have branch prediction, nor does Java have branch prediction.

CPUs have branch prediction.

Re: does perl have branch prediction
by hdb (Monsignor) on Jan 29, 2014 at 07:07 UTC

With respect to your condition fun_a(\$x) && fun_b(\$x) &&.....&&fun_z(\$x):

As C++ like Perl short-circuits && (Short-circuit_evaluation), it should not re-order the evaluation as there can always be side-effects (like changing the value of a global variable) in fun_z(\$x). The program logic could be very different even if the (boolean) result of the expression is the same.

Re: does perl have branch prediction
by oiskuu (Hermit) on Jan 29, 2014 at 12:24 UTC

Note that CPUs (these days) support conditional instructions; good compilers make use of these. Loop body such as

```    if (v[i] >= 128)
sum += v[i];
is likely compiled with a cmov on x86/x86_64 and therefore incurs no branch misprediction penalty.

As for the perl version: the effects of misprediction are practically negligible, lost in the noise.

```perf stat perl5.18.1 -e '@a = map {(\$i += 1.1e-0) + rand} 1..3e6; \$i =
+ \$i/2 + 0.5; \$_ > \$i and \$n+=\$_ for @a;'
8509781489 instructions
1701278883 branches
3938702 branch-misses
1.561091272 seconds time elapsed

perf stat perl5.18.1 -e '@a = map {(\$i += 1.1e-10) + rand} 1..3e6; \$i
+= \$i/2 + 0.5; \$_ > \$i and \$n+=\$_ for @a;'
8505802267 instructions
1700634854 branches
5437641 branch-misses
1.567880931 seconds time elapsed

Re: does perl have branch prediction
by Anonymous Monk on Jan 29, 2014 at 05:29 UTC
:P Whats happened when you tried and measured to see if its faster?

Awww:D

Good point Here is my code
```use strict;
use warnings;
use Benchmark;
use List::Util qw /shuffle/;
use feature 'say';
my \$n = 10**7;
my @sorted = 1..\$n;
my @random =  shuffle @sorted;

my \$t0 = Benchmark->new;
say sum_this(\@sorted);
my \$t1 = Benchmark->new;
my \$td1 = timediff(\$t1, \$t0);
say "sorted:",timestr(\$td1);
my \$t2 = Benchmark->new;
say sum_this(\@random);
my \$t3 = Benchmark->new;
my \$td2 = timediff(\$t3, \$t2);
say "random:",timestr(\$td2);

sub sum_this {
my (\$arr) = @_;
my \$sum;
foreach (my \$i=0;\$i<=\$#\$arr;\$i++) {
my \$x = \$arr->[\$i];
if(\$x>\$n/2) {
\$sum+= \$x;
}

}
return \$sum;
}

The output is
```37500002500000
sorted: 6 wallclock secs ( 5.61 usr +  0.00 sys =  5.61 CPU)
37500002500000
random: 6 wallclock secs ( 5.81 usr +  0.00 sys =  5.81 CPU)
```
There is a difference but not as dramatic as in java or c++ (from the stack overflow post) What is the correct interpretation of the results?

Create A New User
Node Status?
node history
Node Type: perlquestion [id://1072428]
Approved by lidden
help
Chatterbox?
 [erix]: my app (google, really) makes that 1530 km... [Eily]: google tells me 1662 km ... [LanX]: Nancy? Somewhere near Metz ... [Eily]: must be km farenheit or something :P [Eily]: wow, don't say that in Nancy LanX :P [erix]: maybe you used the car routes, I used the walking routes [Eily]: there's a rivalry between the two cities, so defining one in terms of the other might not be well received [Eily]: erix well I did click on the walking icon LanX giggles! [LanX]: Eily I'm duing this constantly with people from rival cities ...MUCHO fun!

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (11)
As of 2017-12-13 15:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
What programming language do you hate the most?

Results (369 votes). Check out past polls.

Notices?