As you may know if you follow my use.perl journal or my posts to the perl6 language lists, I'm working on a Scheme implementation in Perl.

As it turns out, a basic interpreter for a schemelike language is surprisingly easy to implement. The problem is that, in order to be able to call it Scheme, you have to do 'tail call optimization'. "What's that then?" I hear you ask?

Well, here's a simple example. Consider the following piece of tail recursive perl:

sub tail_factorial { my($total, $n) = @_; if ($n == 0) { return $total; } else { return tail_factorial($total * $n, $n - 1) } }
notice that, in the 'else' branch of the code, we're simply returning the value of a further function call, we don't actually do anything with it in the current function. Wouldn't it be great if the 'if' branch of that conditional could just return directly to the 'original' caller of tail_factorial instead of having the result passed up a tower of intermediate callers.

Well, that's what tail call optimization is all about. A full implementation would do the optimization even in cases where the tail call isn't a straightforward recursive call (and that's what is needed for a 'proper' scheme implementation).

Note that you can sort of do tail call optimization in native Perl right now, but it's not exactly optimal. Here's here's that factorial function again:

sub tail_factorial { my $total = shift; my $n = shift; if ($n == 0) { return $total; } else { @_ = ($n * $total, $n - 1); goto &tail_factorial; } }
But it's scary, not desperately quick, and requires programmer intervention. Ideally this sort of thing should happen automatically. Larry has already said that simple tail recursion in Perl 6 will be optimized, but how do I make my scheme interpreter do the Right Thing in perl 5?

Well, frankly, it's scary. I've had to roll my own virtual machine, complete with registers and a stack (there's no call stack though, but eventually I'll implement a 'real' chain of partial continuations.) The driver loop for this machine looks something like:
sub run_ops { my $self = shift; EVAL_DISPATCH: while (1) { my $expr = $self->expr; if ($expr->is_self_evaluating) { $self->{value} = $expr; } else { $expr->evaluate($self); } while (1) { my $cont = $self->continuation; if (ref($cont) eq 'CODE') { $cont->($self); } elsif (!ref($cont)) { return $self->{value} if $cont eq 'return'; $self->$cont(); } else { die "Broken continuation: $cont" } } } }
Notice that labeled while loop. The idea is that, instead of calling $self->expr->evaluate($self) directly (doing a classic OO double dispatch trick), functions set up a continuation (the function that will 'carry on' once things have been evaluated), set 'expr' to an appropriate value and do redo EVAL_DISPATCH, which unwinds the call stack and jumps to the start of the loop again. This avoids recursing desperately deeply into multiple calls to evaluate (my naive, none tail recursive implementation could easily get 100s of levels deep).

The inner while loop is then responsible for handing control off to the current continuation, which will eventually, one hopes, get set to 'return' so the subroutine can exit.

It turns out that this simple loop combined with a small number of registers and a stack gives me enough that I can bootstrap scheme. My current problem is that my simpleminded scheme parser (implemented in perl) recurses far deeper than my interpreter, so I probably need to reimplement it on the virtual machine. (Something I'm going to have to do when I replace the virtual machine with Parrot anyway...). AFAICT it shouldn't be too hard to implement the full scheme call-with-current-continuation on this machine, as well as 'compiled' functions (but compiled functions are some way off for now).

I'm afraid the CPAN release is some way off too. At the very least I need to write a test suite and get a few more of the standard scheme things implemented and, who knows, maybe write some documentation.

And then there's the Parrot compiler to write. And something to compile to perl 5 as well. And the port to Perl 6 (I only started the perl 5 version to make sure that what I'd implemented in perl 6 would actually work), and then there's Inline::Scheme to write, and...

Replies are listed 'Best First'.
Re: Pure Perl tail call optimization
by drewbie (Chaplain) on Apr 24, 2002 at 14:20 UTC
    elsif (!ref($cont)) { return $self->{value} if $cont = 'return'; $self->$cont(); }
    Did you mean "if $cont eq 'return';"? That makes more sense, but it's entirely likely I could be missing something.
      Gah! You are, of course, correct. It's fixed now.
Re: Pure Perl tail call optimization
by samtregar (Abbot) on Apr 24, 2002 at 20:01 UTC
    Instead of writing Inline::Scheme maybe you'd like to help me finish Inline::Guile? I released the first version and then lost motivation in the deafening silence that followed.

    Congratualtions on the pure-Perl tail-calls. I would not have thought it feasible.


Re: Pure Perl tail call optimization
by abstracts (Hermit) on Apr 26, 2002 at 06:54 UTC
    Another note (hope I haven't bothered you enough yet :-))

    Last time I checked with the Parrot docs (the tutorial mainly), there was no mention of jumps to computed values (which is what you said you need for implementing call/cc). So the question is: how can one do proper tail recursion without jumps? (Parrot does have jump LABEL though)

    Another question is: if your C compiler doesn't do proper tail recursion (C compilers are not required to), then can you still do proper tail recursion in C?

    Just for the record, here is the definition of proper tail recursion according to the Scheme Revised^5 Report:

    Implementations of Scheme are required to be properly tail-recursive. ... . A Scheme implementation is properly tail-recursive if it supports an unbounded number of active tail calls. A call is active if the called procedure may still return.

    I don't think scheme implementations are required to support call/cc, but they are required to support proper tail recursion.

    The reason I'm blabbering about this is that I just finished a course in compilers where we implemented a good subset of scheme. The compiler translates scheme to Sparc assembly code, and at some time we targeted C. So, I know proper tail recursion is possible, even in VB :-).


    Update: Thanks Elian for the reply. I guess the docs might be out of date since development there is going very fast.

      Last time I checked with the Parrot docs (the tutorial mainly), there was no mention of jumps to computed values (which is what you said you need for implementing call/cc). So the question is: how can one do proper tail recursion without jumps? (Parrot does have jump LABEL though)
      Parrot supports jumping (and branching) to locations (and offsets, though they're less useful) stored in registers. Only works with the jump, jsr, branch, and bsr ops though. (Used to work with the conditionals, but that was silly so we yanked that support)

      I've just read scheme standard. Scheme has to implement both tail-rec and call/cc.

      As for C, I think it's possible to implement proper tail recursion with setjmp/longjmp, but I'm not sure. Most C compilers have tail call optimizations I think, not because it's so simple but because it's so important.

      BTW, if perl is implemented like this (in a stackless way) then it may be simple to implement tail-recursion optimization (but the developpers must have some reason why it's not already done). Maybe it's also possible to create true call/cc: reference counting should be enough for most uses of call/cc, like exceptions and multithreading.

Re: Pure Perl tail call optimization
by abstracts (Hermit) on Apr 25, 2002 at 22:14 UTC

    In order to use goto to obtain true tail recursion, you need to use goto LABEL and not goto SUB because the later does just what the regular call does. Here is one way how you might do your factorial example:

    printf "10 factorial is %d", scheme_procs('factorial', 1, 10); my $ls = mklist(1,5,5,7,3,6); # make a list print pair_str($ls) . "\n"; # print it my $max = scheme_procs('max', $ls); # get the max printf "Max is $max\n"; #pair_push($ls,$ls); # make a circular list!! # now do the bad thing, never terminate but monitor # memory usage #$max = scheme_procs('max', $ls); # get the max #printf "Max is $max\n"; ################################################### # call scheme_procs(proc name, args); ################################################### sub scheme_procs{ my $proc = shift; if($proc eq 'max'){ # go to the correct label goto SCM_MAX; } elsif($proc eq 'factorial'){ goto SCM_FACTORIAL; } else { die "no such proc: $proc"; } SCHEME_RP: # return the return values if(wantarray){ return(@_); } else { return $_[0]; } SCM_FACTORIAL: my ($total,$n) = @_; if($n == 0){ @_ = ($total); goto SCHEME_RP; # return } else { @_ = ($total*$n, $n - 1); goto SCM_FACTORIAL; # tail call } SCM_MAX: my $p = shift; print_resources(); if(null($p)){ die "Error"; } if(null(cdr($p))){ @_ = (car($p)); goto SCHEME_RP; # return } else { if(car($p) > (car(cdr($p)))){ @_ = (cons(car($p), cdr(cdr($p)))); goto SCM_MAX; # tail call } else { @_ = (cdr($p)); goto SCM_MAX; # tail call } } } # helper functions for our simplistic scheme subsystem sub cons{ my($a,$d) = @_; [$a,$d] } sub car{ shift->[0] } sub cdr{ shift->[1] } sub null{ not defined shift } sub set_cdr{ my ($ls, $v) = @_; $ls->[1] = $v } sub mklist{ return unless @_; return cons(shift, mklist(@_)); } sub pair_str{ my $p = shift; if(ref $p eq 'ARRAY'){ "(" . pair_str(car($p)) . " " . pair_str(cdr($p)) . ")"; } else { if(null($p)){ "()" } else { $p; } } } # this is a really bad resource monitor but works to monitor # memory growth to check for proper tail recursion sub print_resources{ my @lines = grep {$_ =~ /\s+(\d+)/ and $1 eq $$} `ps u`; print @lines; }
      Um... goto &SUB doesn't do the same thing as a normal sub call; no new stack frame is created, which is what tail call optimization is about, only creating a new stack frame when necessary. But it's still awfully slow (actually I've not done a speed comparison between goto SUB and the approach I'm currently using, and I'm not entirely sure how I'd code the benchmark.

      The big problem with goto LABEL is that it's even more awfully slow; doing a linear search through the source file every time it's called, which is why I discounted it. And implementing continuations means using a computed goto.

      I really should try an implementation that uses goto SUB though.

        Um, goto &SUB does create a new stack frame, but only after the old one is stripped off. (Shouldn't leak memory though.)

        And goto LABEL doesn't do a linear search through the source file. And the way the (non-linear) search is written, going from the last statement to a label at the beginning should be pretty fast. Benchmarks are the only way to tell which will be faster.

        Okay, you say that goto &SUB doesn't create a new stack frame, so let's test it shall we:
        #!/usr/bin/perl -w my $method = shift or usage(); if($method eq 'tail'){ $0 = 'tail'; tail_inf(); } elsif($method eq 'nontail'){ $0 = 'nontail'; nontail_inf(); } else { usage(); } sub tail_inf{ L1: print_resources(); goto L2; L2: goto L1; } sub nontail_inf{ goto &T1; sub T1{ print_resources(); goto &T2; } sub T2{ goto &T1; } } { my $i=0; my $j=0; sub print_resources{ $i++; $i = $i % 100000; return unless $i == 0; my @lines = grep {$_ =~ /\s+(\d+)/ and $1 eq $$} `ps u`; print @lines; $j++; exit if $j == 10; } } sub usage{ print "Usage: $0 tail|nontail\n"; exit(-1); }
        Run the code twice, one with tail as an arg, and one with nontail, here is what I get on Perl 5.6.1:
        [334]: ./ tail abstract 5843 0.0 0.2 2324 1052 pts/0 S 01:00 0:00 tail abstract 5843 99.9 0.2 2328 1100 pts/0 S 01:00 0:01 tail abstract 5843 99.9 0.2 2328 1100 pts/0 S 01:00 0:01 tail abstract 5843 99.9 0.2 2328 1100 pts/0 S 01:00 0:02 tail abstract 5843 83.6 0.2 2328 1100 pts/0 S 01:00 0:02 tail abstract 5843 99.9 0.2 2328 1100 pts/0 S 01:00 0:03 tail abstract 5843 88.0 0.2 2328 1100 pts/0 S 01:00 0:03 tail abstract 5843 99.9 0.2 2328 1100 pts/0 S 01:00 0:04 tail abstract 5843 90.6 0.2 2328 1100 pts/0 S 01:00 0:04 tail abstract 5843 83.8 0.2 2328 1100 pts/0 S 01:00 0:05 tail [335]: ./ nontail abstract 5854 86.0 0.9 6300 5040 pts/0 S 01:00 0:00 nontail abstract 5854 85.5 1.7 10276 9056 pts/0 S 01:00 0:01 nontail abstract 5854 85.3 2.5 14244 13024 pts/0 S 01:00 0:02 nontail abstract 5854 85.7 3.3 18212 16996 pts/0 S 01:00 0:03 nontail abstract 5854 86.0 4.0 22184 20964 pts/0 S 01:00 0:04 nontail abstract 5854 85.8 4.8 26148 24928 pts/0 S 01:00 0:05 nontail abstract 5854 85.8 5.6 30116 28900 pts/0 S 01:00 0:06 nontail abstract 5854 85.7 6.3 34084 32868 pts/0 S 01:00 0:06 nontail abstract 5854 85.7 7.1 38056 36836 pts/0 S 01:00 0:07 nontail abstract 5854 85.9 7.9 42024 40808 pts/0 S 01:00 0:08 nontail
        Clearly, the nontail calls (goto &SUB) creates new stack frames while the goto LABEL doesn't.

        So, to *correctly* implement tail calls, I think goto &SUB is improper.

        As for speed, and carrying this example, one can see that the tail call is faster than the nontail call:

        [340]: time ./ tail > /dev/null real 0m6.079s user 0m5.450s sys 0m0.620s [341]: time ./ nontail > /dev/null real 0m10.423s user 0m9.530s sys 0m0.890s
        However, in this example we have two functions calling each other, so perl might not optimize that.

        As for using computed values for goto (ie goto "label value", you can do a jump table. Basically, you have a segment of the code that looks something like this:

        SCM_BRANCH: if($br eq 'SCM_F1'){ goto SCM_F1; } elsif ($br eq 'SCM_F2'){ goto SCM_F2; } ... SCM_F1: # before doing a call, set $rv to where you wanna go # and pass the name of where you want to return as first arg # ie: call F2 with args $a,$b; @_ = ('RET1', $a, $b); $br = 'SCM_F2'; goto SCM_BRANCH; # here we go RET1: # here we return my @returned_vals = @_; ... $br = $rp; goto SCM_BRANCH;

        Update: I agree with samtregar that the reason might not be a creation of a new stack frame. I don't know if this is by design or a bug. However, goto LABEL does not leak: it can run forever juggeling from label to another and *can* be used to create proper tail recursion.

Re: Pure Perl tail call optimization
by Limbic~Region (Chancellor) on Mar 05, 2004 at 18:12 UTC
    Eager to try my tail recursion, I tried your example and was bummed when it didn't work. I figured out that to make it work you had to do something like:
    print tail_factorial(1, 8);
    Where you had to supply an initial total for it to work. Here is a slightly modified version.
    sub tail_factorial { my ($n, $total) = @_; $total = 1 if ! $total; return $total if ! $n; @_ = ($n - 1, $total * $n); goto &tail_factorial; }
    To verify that it works I tested it for a value of 150 - no deep recursion errors - yeah. Thanks for the p6 weekly summaries too.

    Cheers - L~R

      Your code is better written as

      sub loop_factorial { my ($n, $total) = @_; $total = 1 unless $total; while ($n) { ($n,$total) = ($n - 1, $total * $n); } return $total }

      Which is more efficient, and is IMO a better example of what a compiler that optimizes tail recursion would convert it to.


        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi

        Sure - but then it would not be recursive anymore. That is an iterative solution. I wanted to play with avoiding the deep recursion limit by using tail recursion. This is pdcawley's code, not mine.

        Cheers - L~R