Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Why is goto &sub slow?

by Ovid (Cardinal)
on Mar 29, 2004 at 01:01 UTC ( #340465=perlquestion: print w/ replies, xml ) Need Help??
Ovid has asked for the wisdom of the Perl Monks concerning the following question:

So I'm playing around with recursive subroutines and have results that are a bit confusing. Consider a normal recursive function:

sub factorial {
    my $num = shift;
    return 1 if 1 == $num;
    return $num * factorial($num = 1);
}

Ordinarily, we expect this to have quite a bit of overhead. Part of the reason for this is that we have deferred part of the calculation. For example, here's what this looks like if we try to calculate 5!


1:  factorial(5)
2:  (5 * factorial(4))
3:  (5 * (4 * factorial(3)))
4:  (5 * (4 * (3 * factorial(2))))
5:  (5 * (4 * (3 * (2 * factorial(1)))))
6:  (5 * (4 * (3 * (2 * 1))))
7:  (5 * (4 * (3 * 2)))
8:  (5 * (4 * 6))
9:  (5 * 24)
10: (120)

The problem with this is that we must defer part of the calculation until we get to the terminal condition (1 == $num, or step 5) of the recursive function. Then we go back through the call stack and finish the calculations. In theory, we get around this by building our our recursive function in such a way so that we don't need to walk back through the call stack, but instead have the value calculated every step of the way and return directly from the last call, which could allow us to skip steps 6 through 9 above. We can do this by rewriting our factorial function to keep track of the current value of the factorial and using the magical goto &sub notation.

sub factorial { my ($num) = @_; @_ = (1, 1, $num); goto &_factorial; } + sub _factorial { my ($result, $counter, $max) = @_; return $result if $counter > $max; @_ = (($result * $counter), ($counter + 1), $max); goto &_factorial; }

After playing with this, I realized that I could do even better by using a lexically scoped subroutine and skipping the symbol table lookup, though it winds up being more cumbersome to program.

my $_factorial; $_factorial = sub { my ($result, $counter, $max) = @_; return $result if $counter > $max; @_ = (($result * $counter), ($counter + 1), $max); goto $_factorial; }; sub factorial { my ($num) = @_; @_ = (1, 1, $num); goto $_factorial; }

While my benchmarks show that the lexically scoped subroutine with goto is slightly faster than the first version using goto, it's considerably slower than the straight recursive function. pdcawley noted in Pure Perl tail call optimization that using this form of goto is not very fast; I'm curious to know why it's so ridiculously slow. I've tested this on 5.8.0 and 5.9.1 (I mention 5.9.1 to make it clear that the memory leak associated with assignment to @_ and calling goto is not the culprit). Benchmark code and results below. Did I do something stupid?

I might add that I know an iterative solution is faster. I'm just trying to figure out why the goto is slow.

#!/usr/local/bin/perl use warnings; use strict; use Benchmark; use Test::More tests => 3; my $_fact3; $_fact3 = sub { my ($result, $counter, $max) = @_; return $result if $counter > $max; @_ = (($result * $counter), ($counter + 1), $max); goto $_fact3; }; my $number = shift or die "No number supplied"; validate_factorial_routines($number); print "Timing factorial of $number\n"; timethese(10000, { 'recursion' => sub { fact1($number) }, 'typeglob' => sub { fact2($number) }, 'lexical' => sub { fact3($number) }, }); sub validate_factorial_routines { my $num = shift; my @result = (fact1($num), fact2($num), fact3($num)); is(fact1(5), 120, 'fact1 should return the correct amount'); is(fact2(5), 120, 'fact2 should return the correct amount'); is(fact3(5), 120, 'fact3 should return the correct amount'); } sub fact1 { my ($num) = @_; return _fact1(1,1,$num); } sub _fact1 { my ($result, $counter, $max) = @_; return $result if $counter > $max; return _fact1(($result * $counter), ($counter + 1), $max); } sub fact2 { my ($num) = @_; @_ = (1, 1, $num); goto &_fact2; } sub _fact2 { my ($result, $counter, $max) = @_; return $result if $counter > $max; @_ = (($result * $counter), ($counter + 1), $max); goto &_fact2; } sub fact3 { my ($num) = @_; @_ = (1, 1, $num); goto $_fact3; } __END__ 1..3 ok 1 - fact1 should return the correct amount ok 2 - fact2 should return the correct amount ok 3 - fact3 should return the correct amount Testing factorial of 30 120 Benchmark: timing 10000 iterations of lexical, recursion, typeglob... lexical: 4 wallclock secs ( 4.05 usr + 0.00 sys = 4.05 CPU) @ 24 +69.14/s (n=10000) recursion: 1 wallclock secs ( 1.45 usr + 0.00 sys = 1.45 CPU) @ 68 +96.55/s (n=10000) typeglob: 5 wallclock secs ( 4.28 usr + 0.00 sys = 4.28 CPU) @ 23 +36.45/s (n=10000)

Update: I realized that I did one thing different in the tests. If I (uselessly) duplicate the assignment to @_ in &_fact1, it's not much faster, but it's still faster.

1..3 ok 1 - fact1 should return the correct amount ok 2 - fact2 should return the correct amount ok 3 - fact3 should return the correct amount Timing factorial of 30 Benchmark: timing 10000 iterations of lexical, recursion, typeglob... lexical: 4 wallclock secs ( 4.05 usr + 0.00 sys = 4.05 CPU) @ 24 +69.14/s (n=10000) recursion: 4 wallclock secs ( 3.87 usr + 0.00 sys = 3.87 CPU) @ 25 +83.98/s (n=10000) typeglob: 4 wallclock secs ( 4.28 usr + 0.00 sys = 4.28 CPU) @ 23 +36.45/s (n=10000)

Cheers,
Ovid

New address of my CGI Course.

Comment on Why is goto &sub slow?
Select or Download Code
Re: Why is goto &sub slow?
by broquaint (Abbot) on Mar 29, 2004 at 01:30 UTC
    When you perform a goto &sub you're not just calling the function, but you're also subverting the stack by destroying the current frame and any changes made by local within that frame. That's probably not quite the whole story, but I'd say that'd be the major cause of the discrepancy. Also you're tests aren't quite accurate in that the standard subroutine calls don't populate @_ whereas the goto &sub subroutines do. If you modify the tests you'll find the discrepancy will decrease1.
    HTH

    _________
    broquaint

    1 you'll find the test modified benchmark and results in the readmore below ...
      First, thanks Ovid and broquaint. This is a well done example of Perl, benchmarking and test in a different context than make test.

      broquaint, if I understand Your example, there is a typo in sub _fact0, it should return

      return _fact0(($result * $counter), ($counter + 1), $max);
      That makes recursion 1 twice as fast as recursion 2

      And it came to pass that in time the Great God Om spake unto Brutha, the Chosen One: "Psst!"
      (Terry Pratchett, Small Gods)

Re: Why is goto &sub slow?
by dragonchild (Archbishop) on Mar 29, 2004 at 01:36 UTC
    What about:
    { my ($end, $curr, $result); my $_factorial; $_factorial = sub { return $result if $curr > $end; $result *= $curr++; $_factorial->(); }; sub fact4 { ($end, $curr, $result) = (shift, 1, 1); $_factorial->(); } }

    My benchmarking indicates that my version is 30% faster than the standard recursion and 6 times faster than goto.

    dragonchild: 4 wallclock secs ( 3.19 usr + 0.00 sys = 3.19 CPU) @ 1 +22695.46/s (n=392012) lexical: 3 wallclock secs ( 3.10 usr + 0.00 sys = 3.10 CPU) @ 21 +976.48/s (n=68215) recursion: 3 wallclock secs ( 3.08 usr + 0.00 sys = 3.08 CPU) @ 91 +202.59/s (n=281360) typeglob: 4 wallclock secs ( 3.14 usr + 0.00 sys = 3.14 CPU) @ 21 +142.77/s (n=66494)

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      But that's still full recursion, not tail call elimination, which is what goto &sub is.

      jdporter
      The 6th Rule of Perl Club is -- There is no Rule #6.

        Ok. fine.
        { my ($end, $curr, $result); my $_factorial; $_factorial = sub { return $result if $curr > $end; $result *= $curr++; goto &$_factorial; }; sub fact5 { ($end, $curr, $result) = (shift, 1, 1); $_factorial->(); } }

        And, the benchmarking goes something like:

        Timing factorial of 5 Benchmark: running dragonchild1, dragonchild2, lexical, recursion, typ +eglob for at least 3 CPU seconds... dragonchild1: 3 wallclock secs ( 3.19 usr + 0.00 sys = 3.19 CPU) @ +106369.76/s (n=339745) dragonchild2: 3 wallclock secs ( 3.05 usr + 0.00 sys = 3.05 CPU) @ +89930.56/s (n=274558) lexical: 3 wallclock secs ( 3.06 usr + 0.00 sys = 3.06 CPU) @ 29 +376.63/s (n=90010) recursion: 4 wallclock secs ( 3.19 usr + 0.00 sys = 3.19 CPU) @ 68 +035.79/s (n=216694) typeglob: 3 wallclock secs ( 3.01 usr + 0.00 sys = 3.01 CPU) @ 27 +140.10/s (n=81556) Rate typeglob lexical recursion dragonchild2 dr +agonchild1 typeglob 27140/s -- -8% -60% -70% + -74% lexical 29377/s 8% -- -57% -67% + -72% recursion 68036/s 151% 132% -- -24% + -36% dragonchild2 89931/s 231% 206% 32% -- + -15% dragonchild1 106370/s 292% 262% 56% 18% + --

        In other words, my tail-call is 90k/second, straight recursion is 70k/second, but my straight recursion is 105k/second. The regular tail-call are in the 25-30k/second. I lose about 10% going from recursion to tail-call.

        Now, when I shift from factorial(5) to factorial(15) and factorial(25), the benchmarking looks something like:

        Timing factorial of 15 Benchmark: running dragonchild1, dragonchild2, lexical, recursion, typ +eglob for at least 3 CPU seconds... dragonchild1: 3 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ +47052.93/s (n=141347) dragonchild2: 3 wallclock secs ( 3.18 usr + 0.00 sys = 3.18 CPU) @ +40502.36/s (n=128595) lexical: 4 wallclock secs ( 3.24 usr + 0.00 sys = 3.24 CPU) @ 11 +222.80/s (n=36418) recursion: 3 wallclock secs ( 3.12 usr + 0.00 sys = 3.12 CPU) @ 26 +999.68/s (n=84104) typeglob: 3 wallclock secs ( 3.10 usr + 0.00 sys = 3.10 CPU) @ 10 +381.32/s (n=32234) Rate typeglob lexical recursion dragonchild2 dr +agonchild1 typeglob 10381/s -- -7% -62% -74% + -78% lexical 11223/s 8% -- -58% -72% + -76% recursion 27000/s 160% 141% -- -33% + -43% dragonchild2 40502/s 290% 261% 50% -- + -14% dragonchild1 47053/s 353% 319% 74% 16% + -- ------------------------------------ Timing factorial of 25 Benchmark: running dragonchild1, dragonchild2, lexical, recursion, typ +eglob for at least 3 CPU seconds... dragonchild1: 3 wallclock secs ( 3.01 usr + 0.00 sys = 3.01 CPU) @ +29099.20/s (n=87705) dragonchild2: 3 wallclock secs ( 3.12 usr + 0.00 sys = 3.12 CPU) @ +25319.04/s (n=79122) lexical: 4 wallclock secs ( 3.30 usr + 0.00 sys = 3.30 CPU) @ 68 +28.22/s (n=22499) recursion: 4 wallclock secs ( 3.23 usr + 0.00 sys = 3.23 CPU) @ 16 +312.62/s (n=52755) typeglob: 3 wallclock secs ( 3.23 usr + 0.00 sys = 3.23 CPU) @ 61 +14.10/s (n=19773) Rate typeglob lexical recursion dragonchild2 dr +agonchild1 typeglob 6114/s -- -10% -63% -76% + -79% lexical 6828/s 12% -- -58% -73% + -77% recursion 16313/s 167% 139% -- -36% + -44% dragonchild2 25319/s 314% 271% 55% -- + -13% dragonchild1 29099/s 376% 326% 78% 15% + --

        I have no idea why my recursion is still beating my tail-call. The ratio is mostly staying the same, at least between my versions. At 80, this is what happens:

        Timing factorial of 80 Benchmark: running dragonchild1, dragonchild2, lexical, recursion, typ +eglob for at least 3 CPU seconds... dragonchild1: 4 wallclock secs ( 3.18 usr + 0.00 sys = 3.18 CPU) @ +8834.80/s (n=28130) dragonchild2: 4 wallclock secs ( 3.17 usr + 0.00 sys = 3.17 CPU) @ +7694.39/s (n=24422) lexical: 3 wallclock secs ( 3.11 usr + 0.00 sys = 3.11 CPU) @ 20 +67.44/s (n=6438) recursion: 3 wallclock secs ( 3.25 usr + 0.00 sys = 3.25 CPU) @ 50 +47.46/s (n=16379) typeglob: 3 wallclock secs ( 3.32 usr + 0.00 sys = 3.32 CPU) @ 18 +79.40/s (n=6249) Rate typeglob lexical recursion dragonchild2 dr +agonchild1 typeglob 1879/s -- -9% -63% -76% + -79% lexical 2067/s 10% -- -59% -73% + -77% recursion 5047/s 169% 144% -- -34% + -43% dragonchild2 7694/s 309% 272% 52% -- + -13% dragonchild1 8835/s 370% 327% 75% 15% + --

        I'm not sure what all those numbers really mean, but there they are. :-)

        ------
        We are the carpenters and bricklayers of the Information Age.

        Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

Re: Why is goto &sub slow?
by kvale (Monsignor) on Mar 29, 2004 at 01:37 UTC
    tilly posed the same question on p5p. That  goto &sub is slow is a known issue at perl5porters. And in a subsequent message , it was claimed that "the C<goto &NAME> construct is not for tail recursion, but is for fooling the C<caller> function. ". I think someone also claimed that goto LABEL does not have the speed penalty of  goto &sub. A further message notes that some memory leaks associated with  goto &sub are cleaned up in 5.8.1.

    This does not answer your question: why is  goto &sub slow? I don't know a definitive answer, but I gather it takes time to rearrange the stack and clean up the lexical pad before jumping to the new routine.

    -Mark

      That doesn't gibe. Here's the two sequences, as I understand them.

      Standard recursion:

      1. First call to func(). (Create first frame on the stack)
      2. Second call to func(). (Create second frame on the stack)
      3. Return from second call. (Destroy second frame on the stack)
      4. Return from first call. (Destory first frame on the stack)

      goto &sub;

      1. First call to func(). (Create first frame on the stack)
      2. goto &sub; (Destory first frame, create first frame)
      3. Return from goto call. (Destory first frame)

      I count two creations and two destructions of a stack frame. What am I missing? Plus, since goto &sub; doesn't have such a large stack, it takes less memory and lookups take less time. Right?

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

        From L-R's updated benchmark and from your benchmark, recursion is only 2-10 percent faster than goto &sub. As you say above, both recursion and goto &sub create and destroy two frames. So the results seem consistent to me. Unless you are refering to that 10 percent slowdown for goto &sub? I don't know what is happening there.

        -Mark

Re: Why is goto &sub slow?
by biosysadmin (Deacon) on Mar 29, 2004 at 04:06 UTC

    In the spirit of TIMTOWTDI, here's another option for people who are interested:

    It may not be exactly what you're talking about, but your thread made me think of it. :)

      I assume you realize that you listed a factorial table and not a fibonacci table? :)

      Cheers,
      Ovid

      New address of my CGI Course.

        10+9+8+7+6+5+4+3+2+1 doesn't equal 2721600?

        Only if you are using base 10 math and boring Euclidean geometry. Think outside the box!

Re: Why is goto &sub slow?
by bunnyman (Hermit) on Mar 29, 2004 at 16:48 UTC
Re: Why is goto &sub slow?
by zentara (Archbishop) on Mar 29, 2004 at 17:40 UTC
    Well it may be getting a little off-topic for your question, but how about how parrot and perl6 will do it?
    #factorial.pasm print "The first 15 factorials are:\n" set I1, 0 set I2, 15 set I3, 1 REDO: inc I1 mul I3, I3, I1 print I3 print "\n" lt I1, I2, REDO DONE: end
    To run it,
    parrot -o factorial.pbc factorial.pasm then parrot factorial.pbc
    The whole parrot assembly register setup is so cool, I think alot of Perl6 will be "Inline::Pasm", :-)

    I'm not really a human, but I play one on earth. flash japh

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://340465]
Approved by phydeauxarff
Front-paged by Enlil
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (9)
As of 2014-08-20 06:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (105 votes), past polls