Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Closure on Closures

Before we get into this tutorial we need to define what a closure is. The Camel (3rd edition) states that a closure is

when you define an anonymous function in a particular lexical scope at any particular moment
However, I believe this isn't entirely accurate as a closure in perl can be any subroutine referring to lexical variables in the surrounding lexical scopes.[0]

Now with that (simple?) definition out of the way, we can get on with the show!

Before we get started ...

For one to truely understand closures a solid understanding of the principles of lexical scoping is needed, as closures are implemented through the means of lexical scoping interacting with subroutines. For an introduction to lexical scoping in perl see Lexical scoping like a fox, and once you're done with that, head on back.

Right, are we all here now? Bueller ... Bueller .. Bueller? Good.
Now that we have our basic elements, let's weave them together with a stitch of explanation and a thread of code.

Hanging around

Now as we all know, lexical variables are only active for the length of the surrounding lexical scope, but can be kept around in an indirect manner if something else references them e.g

1: sub DESTROY { print "stick a fork in '$_[0]' it's done\n" } 2: 3: my $foo = bless []; 4: { 5: my $bar = bless {}; 6: ## keep $bar around 7: push @$foo => \$bar; 8: 9: print "in \$bar's [$bar] lexical scope\n"; 10: } 11: 12: print "we've left \$bar's lexical scope\n"; __output__ in $bar's [main=HASH(0x80fbbf0)] lexical scope we've left $bar's lexical scope stick a fork in 'main=ARRAY(0x80fbb0c)' it's done stick a fork in 'main=HASH(0x80fbbf0)' it's done
The above example illustrates that $bar isn't cleaned up until $foo, which references it, leaves the surrounding lexical scope (the file-level scope in this case). So from that we can see lexical variables only stick around for the length of the surrounding scope or until they're no longer referenced.

But what if we were to re-enter a scope where a variable is still visible, but the scope has already exited - will the variable still exist?

1: { 2: my $foo = "a string"; 3: INNER: { 4: print "\$foo: [$foo]\n"; 5: } 6: } 7: goto INNER unless $i++; __output__ $foo: [a string] $foo: []
As we can see the answer is categorically 'No'. In retrospect this is quite obvious as $foo has gone out of scope and there is no longer a reference to it.

A bit of closure

However, the last example just used a simple bareblock, now let's try it with a subroutine as the inner block

1: { 2: my $foo = "a string"; 3: sub inner { 4: print "\$foo: [$foo]\n"; 5: } 6: } 7: inner(); 8: inner(); __output__ $foo: [a string] $foo: [a string]
"Hold on there cowboy - $foo has already gone out of scope at the time of the first call to inner() let alone the second, what's going on there?!?", or so one might say. Now hold your horses, there is a very good reason for this behaviour - the subroutine in the example is a closure. "Ok, so it's a closure, but why?", would be a good question at this point. The reason is that subroutines in perl have what's called a scratchpad which holds references to any lexical variables referred to within the subroutine. This means that you can directly access lexical variables within subroutines even though the given variables' scope has exited.

Hmmm, that was quite a lot of raw info, so let's break it down somewhat. Firstly subroutines can hold onto variables from higher lexical scopes. Here's a neat little counter example (not counter-example ;)

1: { 2: my $cnt = 5; 3: sub counter { 4: return $cnt--; 5: } 6: } 7: 8: while(my $i = counter()) { 9: print "$i\n"; 10: } 11: print "BOOM!\n"; __output__ 5 4 3 2 1 BOOM!
While not immediately useful, the above example does demonstrate a subroutine counter() (line 3) holding onto a variable $cnt (line 2) after it has gone out of scope. Because of this behaviour of capturing lexical state the counter() subroutine acts as a closure.

Now if we look at the above example a little closer we might notice that it looks like the beginnings of a basic iterator. If we just tweak counter() and have it return an anonymous sub we'll have ourselves a very simple iterator

1: sub counter { 2: my $cnt = shift; 3: return sub { $cnt-- }; 4: } 5: 6: my $cd = counter(5); 7: while(my $i = $cd->()) { 8: print "$i\n"; 9: } 10: 11: print "BOOM!\n"; __output__ 5 4 3 2 1 BOOM!
Now instead of counter() being the closure we return an anonymous subroutine (line 3) which becomes a closure as it holds onto $cnt (line 2). Every time the newly created closure is executed the $cnt passed into counter() is returned and decremented (this post-return modification behaviour is due to the nature of the post-decrement operator, not the closure).

So if we further apply the concepts of closures we can write ourselves a very basic directory iterator

1: use IO::Dir; 2: 3: sub dir_iter { 4: my $dir = IO::Dir->new(shift) or die("ack: $!"); 5: 6: return sub { 7: my $fl = $dir->read(); 8: $dir->rewind() unless defined $fl; 9: return $fl; 10: }; 11: } 12: 13: my $di = dir_iter( "." ); 14: while(defined(my $f = $di->())) { 15: print "$f\n"; 16: } __output__ . .. .closuretut.html.swp closuretut.html
In the code above dir_iter() (line 3) is returning an anonymous subroutine (line 6) which is holding $dir (line 4) from a higher scope and therefore acts as a closure. So we've created a very basic directory iterator using a simple closure and a little bit of help from IO::Dir.

Wrapping it up

This method of creating closures using anonymous subroutines can be very powerful[1]. With the help of Richard Clamp's marvellous File::Find::Rule we can build ourselves a handy little grep like tool for XML files

1: use strict; 2: use warnings; 3: 4: use XML::Simple; 5: use Getopt::Std; 6: use File::Basename; 7: use File::Find::Rule; 8: use Data::Dumper; 9: 10: $::PROGRAM = basename $0; 11: 12: getopts('n:t:hr', my $opts = {}); 13: 14: usage() if $opts->{h} or @ARGV == 0; 15: 16: my @dirs = $opts->{r} ? @ARGV : map dirname($_), @ARGV; 17: my @files = $opts->{r} ? '*.xml' : map basename($_), @ARGV; 18: my $callback = gensub($opts); 19: 20: my @found = find( 21: file => 22: name => \@files, 23: ## handy callback which wraps around the callback created above 24: exec => sub { $callback->( XMLin $_[-1] ) }, 25: in => [ @dirs ] 26: ); 27: 28: print "$::PROGRAM: no files matched the search criteria\n" and exi +t(0) 29: if @found == 0; 30: 31: print "$::PROGRAM: the following files matched the search criteria +\n", 32: map "\t$_\n", @found; 33: 34: exit(0); 35: 36: sub usage { 37: print "Usage: $::PROGRAM -t TEXT [-n NODE -h -r] FILES\n"; 38: exit(0); 39: } 40: 41: sub gensub { 42: my $opts = shift; 43: 44: ## basic matcher wraps around the program options 45: return sub { Dumper($_[0]) =~ /\Q$opts->{t}/sm } 46: unless exists $opts->{n}; 47: 48: ## node based matcher wraps around options and itself! 49: my $self; $self = sub { 50: my($tree, $seennode) = @_; 51: 52: for(keys %$tree) { 53: $seennode = 1 if $_ eq $opts->{n}; 54: 55: if( ref $tree->{$_} eq 'HASH') { 56: return $self->($tree->{$_}, $seennode); 57: } elsif( ref $tree->{$_} eq 'ARRAY') { 58: return !!grep $self->($_, $seennode), @{ $tree->{$_} }; 59: } else { 60: next unless $seennode; 61: return !!1 62: if $tree->{$_} =~ /\Q$opts->{t}/; 63: } 64: } 65: return; 66: }; 67: 68: return $self; 69: }
Disclaimer: the above isn't thoroughly tested and isn't nearly perfect so think twice before using in the real world

The code above contains 3 simple examples of closures using anonymous subroutines (in this case acting as callbacks). The first closure can be found on in the exec parameter (line 24) of the find call. This is wrapping around the $callback variable generated by the gensub() function. Then within the gensub() (line 41) there are 2 closures which wrap around the $opts lexical, the second of which also wraps around $self which is a reference to the callback which is returned.

Altogether now

So let's bring it altogether now - a closure is a subroutine which wraps around lexical variables that it references from the surrounding lexical scope which subsequently means that the lexical variables that are referenced are not garbage collected when their immediate scope is exited.

There ya go, closure on closures! Hopefully this tutorial has conveyed the meaning and purpose of closures in perl and hasn't been too confounding along the way.

Thanks to virtualsue, castaway, Corion, xmath, demerphq, Petruchio, tye for help during the construction of this tutorial

[0] see. chip's Re: Toggling between two values for a more technical definition (and discussion) of closures within perl
[1] see. tilly's Re (tilly) 9: Why are closures cool?, on the pitfalls of nested package level subroutines vs. anonymous subroutines when dealing with closures


Disclaimer: Not everyone will agree with the terminology (I imagine) so as long as you don't find the descriptions wildly off the mark or generally misleading then they're likely to stay as they are.

Update - revised the second and third sections by dropping any references to 'reference counting'

In reply to Closure on Closures by broquaint

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others studying the Monastery: (7)
    As of 2020-12-02 14:02 GMT
    Find Nodes?
      Voting Booth?
      How often do you use taint mode?

      Results (41 votes). Check out past polls.