Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Speed/Efficiency tweaks for a fannkuch benchmark script?

by robin (Chaplain)
on Dec 01, 2005 at 23:35 UTC ( #513483=note: print w/ replies, xml ) Need Help??


in reply to Speed/Efficiency tweaks for a fannkuch benchmark script?

I can obtain a modest (< 10%) but consistent speedup by modifying the permutation generator so that it doesn't even generate sequences that would fail the extended precheck. Like this:

sub fannkuch { my ( $a, $level, $split ) = ( @_, 0, 1 ); my ( $index, $ok, $copy ) = ( $level, $level + 1 == length( $a ), +$a ); if ($ok) { # print "Before munging: ($split) ", unpack('C*', $copy); $index = $split - 1; substr($copy, $index, 0) = chop($copy); } do { if ($ok) { # print "($split) ", unpack('C*', $copy); my $q = $copy; my ( $k, $flips ); for ( $flips = 0; ( $k = ord( $q ) ) != 1; $flips++ ) { substr( $q, 0, $k ) = reverse substr( $q, 0, $k ); } if ( $flips >= $maxflips ) { if ( $flips == $maxflips) { push @max_sequence, $copy; } else { $maxflips = $flips; @max_sequence = ($copy); } } } else { fannkuch( $copy, 1 + $level, $split ); $split = $level + 1 if $index == $split; } substr( $copy, $index - 1, 2 ) = reverse substr( $copy, $index + -1, 2 ); } while $index--; return $maxflips; }
The idea is that $split is the smallest positive number for which all the elements in substr($copy, 0, $split) are less than or equal to $split.

This table shows a few examples:
sequence $split
153421
213452
231453
234154
314254
315425
(The commented-out debugging statements may make it easier to follow what's going on. They certainly helped me to understand it as I was writing it.)

On each top-level run (i.e. where $ok is true) we can skip directly to the first permutation where the moving element (which starts out as the last one in the list) has passed the split-point.

I've also reordered the maxflips logic, which gives an additional small improvement.

On an unloaded Linux PC, I get the following results. This is the output from diff --side-by-side -W 80, with the output from BrowserUk's code on the left, and mine on the right:

[rpc142: /tmp]$ diff --side-by-side -W 80 fann-char-orig.out fann-char +.out Pfannkuchen(1) = 0 for: Pfannkuchen(1) = 0 for: 8.7e-05 elapsed seconds. | 1 > 0.000118 elapsed seconds. Pfannkuchen(2) = 1 for: Pfannkuchen(2) = 1 for: 21 21 7.7e-05 elapsed seconds. | 6.4e-05 elapsed seconds. Pfannkuchen(3) = 2 for: Pfannkuchen(3) = 2 for: 231 231 312 312 0.000134 elapsed seconds. | 0.000133 elapsed seconds. Pfannkuchen(4) = 4 for: Pfannkuchen(4) = 4 for: 2413 2413 3142 3142 0.000307 elapsed seconds. | 0.000281 elapsed seconds. Pfannkuchen(5) = 7 for: Pfannkuchen(5) = 7 for: 31452 31452 0.001292 elapsed seconds. | 0.001164 elapsed seconds. Pfannkuchen(6) = 10 for: Pfannkuchen(6) = 10 for: 365142 365142 415263 415263 416523 416523 456213 456213 564132 564132 0.008305 elapsed seconds. | 0.007341 elapsed seconds. Pfannkuchen(7) = 16 for: Pfannkuchen(7) = 16 for: 3146752 3146752 4762153 4762153 0.062369 elapsed seconds. | 0.056875 elapsed seconds. Pfannkuchen(8) = 22 for: Pfannkuchen(8) = 22 for: 61578324 61578324 0.541676 elapsed seconds. | 0.498606 elapsed seconds. Pfannkuchen(9) = 30 for: Pfannkuchen(9) = 30 for: 615972834 615972834 5.347029 elapsed seconds. | 4.943895 elapsed seconds. Pfannkuchen(10) = 38 for: Pfannkuchen(10) = 38 for: 59186210473 59186210473 58.118665 elapsed seconds. | 54.24672 elapsed seconds.


Comment on Re: Speed/Efficiency tweaks for a fannkuch benchmark script?
Select or Download Code
Re^2: Speed/Efficiency tweaks for a fannkuch benchmark script?
by thundergnat (Deacon) on Dec 02, 2005 at 01:46 UTC

    Some more micro-optimization, proably getting into foolish optimization at this point. Reduced unnecesary copying of variables, performing some calculations used many times outside the loop, added a special check for sequences that start with n.

    The line checking for sequences starting with n needs some explanation. It is probably the most obscure.

    There are two givens for this algorithm.

    • If ($next == $length) then ($level + 1 == $length).
    • $maxflips DOES NOT get reset between runs.

    So: if the first character is n then the minimum $maxflips{n} is $maxflips{n-1} + 1. It is impossible to have a sequence to take less than length - 1 flips, so if $length - 1 is less than $maxflips{n-1} then it impossible for a sequence that starts with n to have more than $maxflips{n-1} + 1. Therefore, no need to check them.

    Rather than compare $length to $maxflips + 1, since $level == $length - 1, I save the calculation in the loop and compare $level to $maxflips.

    sub fannkuch { my ( $copy, $level, $split ) = ( @_, 0, 1 ); my ( $index, $next ) = ( $level, $level + 1 ); my $length = length($copy); if ($next == $length) { ($index, $split) = ($split - 1, $level); substr($copy, $index, 0) = chop($copy); } do { if ($next == $length) { unless ( ord($copy) == $length and $level < $maxflips ) { my $q = $copy; my ( $k, $flips ); for ( $flips = 0; ( $k = ord( $q ) ) != 1; $flips++ ) +{ substr( $q, 0, $k ) = reverse substr( $q, 0, $k ); } if ( $flips >= $maxflips ) { if ( $flips == $maxflips) { push @max_sequence, $copy; } else { $maxflips = $flips; @max_sequence = ($copy); } } } } else { fannkuch( $copy, $next, $split ); } substr( $copy, $index - 1, 2 ) = reverse substr( $copy, $index + - 1, 2 ); $split = $next if $index == $split; } while $index--; return $maxflips; }

    Side by side with your code: It is only about a 4-6% increase, but it is repeatable.

    Your sub My sub Pfannkuchen(1) = 0 for: Pfannkuchen(1) = 0 for: 1 1 0.000351 elapsed seconds. 0.000274 elapsed seconds. Pfannkuchen(2) = 1 for: Pfannkuchen(2) = 1 for: 21 21 0.000196 elapsed seconds. 0.00016 elapsed seconds. Pfannkuchen(3) = 2 for: Pfannkuchen(3) = 2 for: 231 231 312 312 0.000275 elapsed seconds. 0.000225 elapsed seconds. Pfannkuchen(4) = 4 for: Pfannkuchen(4) = 4 for: 2413 2413 3142 3142 0.000348 elapsed seconds. 0.000288 elapsed seconds. Pfannkuchen(5) = 7 for: Pfannkuchen(5) = 7 for: 31452 31452 0.000701 elapsed seconds. 0.000634 elapsed seconds. Pfannkuchen(6) = 10 for: Pfannkuchen(6) = 10 for: 365142 365142 415263 415263 416523 416523 456213 456213 564132 564132 0.00376 elapsed seconds. 0.003357 elapsed seconds. Pfannkuchen(7) = 16 for: Pfannkuchen(7) = 16 for: 3146752 3146752 4762153 4762153 0.025629 elapsed seconds. 0.023704 elapsed seconds. Pfannkuchen(8) = 22 for: Pfannkuchen(8) = 22 for: 61578324 61578324 0.226504 elapsed seconds. 0.209445 elapsed seconds. Pfannkuchen(9) = 30 for: Pfannkuchen(9) = 30 for: 615972834 615972834 2.209246 elapsed seconds. 2.088572 elapsed seconds. Pfannkuchen(10) = 38 for: Pfannkuchen(10) = 38 for: 59186210473 59186210473 24.218115 elapsed seconds. 23.077981 elapsed seconds.
      I made a couple of small improvements to my code a few minutes after posting:
      • removed an unnecessary assignment to $split;
      • moved the other $split assignment into the else clause.
      Your code is based on the original. Sorry I didn't flag the update, I didn't think anyone would have already d/led the code so quickly!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://513483]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2015-07-03 00:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (47 votes), past polls