In the abstract it's rather simple: a Perl-level coderef basically has a pointer to the actual code optree, and a pointer to the outer lexical pad (the data structure that holds the lexical variables) that the closure closes over. When the outer subroutine is run, a new pad is created, and anonymous subroutines reference that new pad.
(This is the reason why named inner subs aren't closures -- they can be run before the lexical pad of the outer subroutine is created, so they cannot reference it).
My first attempt to demonstrate that failed, because perl is smart and reuses references when possible:
use 5.010;
sub f {
my $x = shift;
sub () { $x }
}
say f(1);
say f(2);
__END__
CODE(0x18cede8)
CODE(0x18cede8)
The culprit here is that the ref count of the return value from f(1) goes to zero as soon as it has been printed. To demonstrate that the new association between a lexical pad and a code block indeed creates a new reference, we have to keep the old reference around:
use 5.010;
sub f {
my $x = shift;
sub () { $x }
}
say my $x = f(1);
say f(2);
__END__
CODE(0x1d5cde8)
CODE(0x1d5cf98)
Finally two different addresses from the same anon subroutine.
In the concrete it's more complicated, because closures can close over multiple outer lexpads, and care must be taken that recursion doesn't lead to lexical confusion.
I don't know much about the Perl 5 internals, I fear you either have to ask one of the perl 5 porters (Zefram or Nicholas would be good candidates), or browse the sources. perlguts and perlapi seem to be silent on this matter. But maybe it helps you to look at the output from B::Concise, because it gives you an idea what opcodes are involved:
perl -MO=Concise,f -e 'sub f { my $x = shift; sub () { $x } };'
main::f:
9 <1> leavesub[1 ref] K/REFC,1 ->(end)
- <@> lineseq KP ->9
1 <;> nextstate(main 1 -e:1) v ->2
4 <2> sassign vKS/2 ->5
2 <0> shift s* ->3
3 <0> padsv[$x:1,3] sRM*/LVINTRO ->4
5 <;> nextstate(main 3 -e:1) v ->6
8 <1> refgen K/1 ->9
- <1> ex-list lKRM ->8
6 <0> pushmark sRM ->7
7 <$> anoncode[CV ""] lRM ->8
-e syntax OK
It seems to be the interplay of refgen and anoncode that are responsible for creating the closure correctly.
(Update: several small wording updates).
Another update:
Update:I'm fully conversant with how closures work at the perl level. I am interested in the details of the internal implementation. I don't know how to make this question clearer?
Well, for me the explanation on the Perl level end at "subroutines can access the variables from outer scopes that were there at the time the reference to the subroutine was taken". Everything else (the fact the coderefs store pointers to lexpads and code blocks, the name of the ops etc.) is already "details of the internal implementation".
If what I wrote aren't the details you are looking for, what is it your are looking for? I genuinely don't understand that.
|