Your skill will accomplishwhat the force of many cannot PerlMonks

### Re: Speed/Efficiency tweaks for a fannkuch benchmark script?

by robin (Chaplain)
 on Dec 01, 2005 at 23:35 UTC ( #513483=note: print w/ replies, xml ) Need Help??
I can obtain a modest (< 10%) but consistent speedup by modifying the permutation generator so that it doesn't even generate sequences that would fail the extended precheck. Like this:
```sub fannkuch {
my ( \$a, \$level, \$split ) = ( @_, 0, 1 );
my ( \$index, \$ok, \$copy ) = ( \$level, \$level + 1 == length( \$a ),
+\$a );

if (\$ok) {
# print "Before munging: (\$split) ", unpack('C*', \$copy);
\$index = \$split - 1;
substr(\$copy, \$index, 0) = chop(\$copy);
}

do {
if (\$ok) {
# print "(\$split) ", unpack('C*', \$copy);
my \$q = \$copy;
my ( \$k, \$flips );
for ( \$flips = 0; ( \$k = ord( \$q ) ) != 1; \$flips++ ) {
substr( \$q, 0, \$k ) = reverse substr( \$q, 0, \$k );
}
if ( \$flips >= \$maxflips ) {
if ( \$flips == \$maxflips) {
push @max_sequence, \$copy;
}
else {
\$maxflips     = \$flips;
@max_sequence = (\$copy);
}
}
}
else {
fannkuch( \$copy, 1 + \$level, \$split );
\$split = \$level + 1 if \$index == \$split;
}
substr( \$copy, \$index - 1, 2 ) = reverse substr( \$copy, \$index
+ -1, 2 );

} while \$index--;
return \$maxflips;
}
[download]```
The idea is that \$split is the smallest positive number for which all the elements in substr(\$copy, 0, \$split) are less than or equal to \$split.

This table shows a few examples:
sequence \$split
153421
213452
231453
234154
314254
315425
(The commented-out debugging statements may make it easier to follow what's going on. They certainly helped me to understand it as I was writing it.)

On each top-level run (i.e. where \$ok is true) we can skip directly to the first permutation where the moving element (which starts out as the last one in the list) has passed the split-point.

I've also reordered the maxflips logic, which gives an additional small improvement.

On an unloaded Linux PC, I get the following results. This is the output from diff --side-by-side -W 80, with the output from BrowserUk's code on the left, and mine on the right:

```[rpc142: /tmp]\$ diff --side-by-side -W 80 fann-char-orig.out fann-char
+.out
Pfannkuchen(1) = 0 for:                 Pfannkuchen(1) = 0 for:
8.7e-05 elapsed seconds.              | 1
> 0.000118 elapsed seconds.

Pfannkuchen(2) = 1 for:                 Pfannkuchen(2) = 1 for:
21                                      21
7.7e-05 elapsed seconds.              | 6.4e-05 elapsed seconds.

Pfannkuchen(3) = 2 for:                 Pfannkuchen(3) = 2 for:
231                                     231
312                                     312
0.000134 elapsed seconds.             | 0.000133 elapsed seconds.

Pfannkuchen(4) = 4 for:                 Pfannkuchen(4) = 4 for:
2413                                    2413
3142                                    3142
0.000307 elapsed seconds.             | 0.000281 elapsed seconds.

Pfannkuchen(5) = 7 for:                 Pfannkuchen(5) = 7 for:
31452                                   31452
0.001292 elapsed seconds.             | 0.001164 elapsed seconds.

Pfannkuchen(6) = 10 for:                Pfannkuchen(6) = 10 for:
365142                                  365142
415263                                  415263
416523                                  416523
456213                                  456213
564132                                  564132
0.008305 elapsed seconds.             | 0.007341 elapsed seconds.

Pfannkuchen(7) = 16 for:                Pfannkuchen(7) = 16 for:
3146752                                 3146752
4762153                                 4762153
0.062369 elapsed seconds.             | 0.056875 elapsed seconds.

Pfannkuchen(8) = 22 for:                Pfannkuchen(8) = 22 for:
61578324                                61578324
0.541676 elapsed seconds.             | 0.498606 elapsed seconds.

Pfannkuchen(9) = 30 for:                Pfannkuchen(9) = 30 for:
615972834                               615972834
5.347029 elapsed seconds.             | 4.943895 elapsed seconds.

Pfannkuchen(10) = 38 for:               Pfannkuchen(10) = 38 for:
59186210473                             59186210473
58.118665 elapsed seconds.            | 54.24672 elapsed seconds.
[download]```
Comment on Re: Speed/Efficiency tweaks for a fannkuch benchmark script?
Select or Download Code
Replies are listed 'Best First'.
Re^2: Speed/Efficiency tweaks for a fannkuch benchmark script?
by thundergnat (Deacon) on Dec 02, 2005 at 01:46 UTC

Some more micro-optimization, proably getting into foolish optimization at this point. Reduced unnecesary copying of variables, performing some calculations used many times outside the loop, added a special check for sequences that start with n.

The line checking for sequences starting with n needs some explanation. It is probably the most obscure.

There are two givens for this algorithm.

• If (\$next == \$length) then (\$level + 1 == \$length).
• \$maxflips DOES NOT get reset between runs.

So: if the first character is n then the minimum \$maxflips{n} is \$maxflips{n-1} + 1. It is impossible to have a sequence to take less than length - 1 flips, so if \$length - 1 is less than \$maxflips{n-1} then it impossible for a sequence that starts with n to have more than \$maxflips{n-1} + 1. Therefore, no need to check them.

Rather than compare \$length to \$maxflips + 1, since \$level == \$length - 1, I save the calculation in the loop and compare \$level to \$maxflips.

```sub fannkuch {
my ( \$copy, \$level, \$split ) = ( @_, 0, 1 );
my ( \$index, \$next ) = ( \$level, \$level + 1 );
my \$length = length(\$copy);

if (\$next == \$length) {
(\$index, \$split) = (\$split - 1, \$level);
substr(\$copy, \$index, 0) = chop(\$copy);
}
do {
if (\$next == \$length) {
unless ( ord(\$copy) == \$length and \$level < \$maxflips )
{
my \$q = \$copy;
my ( \$k, \$flips );
for ( \$flips = 0; ( \$k = ord( \$q ) ) != 1; \$flips++ )
+{
substr( \$q, 0, \$k ) = reverse substr( \$q, 0, \$k );
}
if ( \$flips >= \$maxflips ) {
if ( \$flips == \$maxflips) {
push @max_sequence, \$copy;
}
else {
\$maxflips     = \$flips;
@max_sequence = (\$copy);
}
}
}
}
else {
fannkuch( \$copy, \$next, \$split );
}
substr( \$copy, \$index - 1, 2 ) = reverse substr( \$copy, \$index
+ - 1, 2 );

\$split = \$next if \$index == \$split;
} while \$index--;
return \$maxflips;
}
[download]```

Side by side with your code: It is only about a 4-6% increase, but it is repeatable.

```Your sub                          My sub

Pfannkuchen(1) = 0 for:            Pfannkuchen(1) = 0 for:
1                                  1
0.000351 elapsed seconds.          0.000274 elapsed seconds.

Pfannkuchen(2) = 1 for:            Pfannkuchen(2) = 1 for:
21                                 21
0.000196 elapsed seconds.          0.00016 elapsed seconds.

Pfannkuchen(3) = 2 for:            Pfannkuchen(3) = 2 for:
231                                231
312                                312
0.000275 elapsed seconds.          0.000225 elapsed seconds.

Pfannkuchen(4) = 4 for:            Pfannkuchen(4) = 4 for:
2413                               2413
3142                               3142
0.000348 elapsed seconds.          0.000288 elapsed seconds.

Pfannkuchen(5) = 7 for:            Pfannkuchen(5) = 7 for:
31452                              31452
0.000701 elapsed seconds.          0.000634 elapsed seconds.

Pfannkuchen(6) = 10 for:           Pfannkuchen(6) = 10 for:
365142                             365142
415263                             415263
416523                             416523
456213                             456213
564132                             564132
0.00376 elapsed seconds.           0.003357 elapsed seconds.

Pfannkuchen(7) = 16 for:           Pfannkuchen(7) = 16 for:
3146752                            3146752
4762153                            4762153
0.025629 elapsed seconds.          0.023704 elapsed seconds.

Pfannkuchen(8) = 22 for:           Pfannkuchen(8) = 22 for:
61578324                           61578324
0.226504 elapsed seconds.          0.209445 elapsed seconds.

Pfannkuchen(9) = 30 for:           Pfannkuchen(9) = 30 for:
615972834                          615972834
2.209246 elapsed seconds.          2.088572 elapsed seconds.

Pfannkuchen(10) = 38 for:          Pfannkuchen(10) = 38 for:
59186210473                        59186210473
24.218115 elapsed seconds.         23.077981 elapsed seconds.
[download]```
I made a couple of small improvements to my code a few minutes after posting:
• removed an unnecessary assignment to \$split;
• moved the other \$split assignment into the else clause.
Your code is based on the original. Sorry I didn't flag the update, I didn't think anyone would have already d/led the code so quickly!

Log In?
 Username: Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://513483]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (10)
As of 2016-05-02 20:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
What font do you use for programming?

Results (52 votes). Check out past polls.