modular file scoping

Pstack has asked for the wisdom of the Perl Monks concerning the following question:

#..................."~/test/DataBank.pm"
package DataBank;                                                     
+  
our @EXPORT_OK = qw( set_webpath get_scanpath get_findpath);
my %webpaths = (
    mypath => "",
    scanpath => "~/WebTabs/auto/htmlscan.0",
    findpath => "~/WebTabs/auto/htmlfilter.0"    
);
sub get_scanpath {return ($webpaths{"mypath"} || $webpaths{"scanpath"}
+);}
sub get_findpath {return ($webpaths{"mypath"} || $webpaths{"findpath"}
+);}
sub set_webpath {
    my $pathme = shift;
    unless (-f $pathme){$pathme = "";}
    $pathme ||= get_scanpath();    
    $webpaths{"mypath"} = $pathme;
}                                           

#..................."~/test/Scanner.pm"
package Scanner; ;
our @EXPORT_OK = qw( postscan);    
use Importer 'SpecsGet' => qw( pathout );
sub webget {.........}                    # download heaps
sub postscan {
    my $pathx = pathout();
    print "\n$pathx\n\n";
}                                                

#..................."~/test/SpecsGet.pm"
package SpecsGet;
our @EXPORT_OK = qw( pathout);                                        
+   
use Importer 'DataBank' => qw( get_scanpath get_findpath );
sub pathout {return get_scanpath();}

#..................."~/test/testme.pl"
use Importer 'DataBank' => qw (set_webpath);        
use Importer 'Scanner' =>qw (postscan);
use Cwd qw(getcwd cwd);                        
set_webpath (getcwd()."/xxx");
#........................................# hours later
postscan();
[download]

Given the 4 files above suitably 'stricted' etc & closed off properly:

#> perl testme.pl ==> "~/xxx" ... (as expected)

Seems to function perfectly well, deploying DataBank.pm as a central store for 'modularised' semi-static specifications whose values may occasionally get changed by yet other modules. The problem I am having is not understanding exactly why it works so I can rely on it?

There are myriad net explanations of the seldom used "local", "our", & "package" scopings but very little on this "file" scoping that I (and maybe others) use extensively. If the "file"-scoped lexicals of DataBank.pm are forced to stay in scope throughout, what is doing the forcing with respect to scoping rules?

Comment on modular file scoping Select or Download Code

Replies are listed 'Best First'.
Re: modular file scoping (updated) by haukex (Archbishop) on Oct 10, 2017 at 05:43 UTC
There are myriad net explanations of the seldom used "local", "our", & "package" scopings but very little on this "file" scoping Well, local, our, and package scoped variables (the latter two being pretty much the same anyway, see our) are actually very common, I'd say roughly as common as lexically scoped (my) variables, including at the lexical scope of the file. Admittedly, when writing a normal script one probably types `my` a lot more often than `our`, `local`, or `state`, but package and dynamic scoping may be being used a lot under the hood, like in the modules one loads. Anyway, if I am understanding your question correctly, you're asking about the `my %webpaths` variable, and why your three functions in `package DataBank` can keep using it despite the execution of the file `DataBank.pm` having already finished? That's because the three `sub`s refer to it, and so long as there is something that refers to the variable, Perl keeps it around. These "references" are uses of the variable in its lexical scope, as in your case, or lexical variables used by closures, but also references created explicitly, as in `my $hashref = \%webpaths`. So the answer is yes, this is the intended behavior you can rely on. Maybe this helps: use warnings; use strict; { package Tracer; sub new { my $c=shift; bless {@_}, $c } sub DESTROY { print "DESTROY ".shift->{name}."\n" } } END { print "END\n" } my $one = Tracer->new(name=>'one'); my $two = Tracer->new(name=>'two'); sub foo { my $three = Tracer->new(name=>'three'); my $four = Tracer->new(name=>'four'); $one->{foo}++; print "end of sub foo\n"; return $three; } my $th = foo(); print "clearing \$th\n"; $th = undef; print "end of main\n"; __END__ end of sub foo DESTROY four clearing $th DESTROY three end of main DESTROY two END DESTROY one [download] Here, objects of my little class `Tracer` simply print a message when they are destroyed, which in Perl happens when the last reference to a variable goes away and it is garbage collected. You can see that: `$four` is declared in the scope of `sub foo` and only used there, so when the call to `sub foo` finishes executing, the variable is destroyed, `$three` is declared in the scope of `sub foo`, but the reference to the object is returned from the `sub`, so the outside code now holds a reference to it in `$th`, so it is not destroyed until we get rid of that reference, `$two` is declared at the scope of the file, and there are no other references to it, so when the scope of the file ends, it is destroyed, `$one` is declared at the scope of the file, and since `sub foo` uses it, it is kept around until global destruction. Further reading: perlsub, in particular "Private Variables via my()", perlref, and perhaps also perlmod. Perhaps the bit of info you were missing was `<update2>` ~~the keyword "lexical scope"~~ that "file scope" is not really special, but just another lexical scope. `</update2>` Update: Since I glossed over it above, it may be important to note that my has both a compile-time and run-time effect. At compile time, it declares that there is a lexical variable of that name so the compiler knows what that variable is when it sees it in the following code, but the initialization doesn't actually happen until runtime. This means that, for example, in `sub bar { my %h; ... }`, every call to `bar()` will create a new hash `%h` (unlike package variables, as I described in this recent thread). Also made a few small updates to above wording for clarification.	[reply] [d/l] [select]
Re^2: modular file scoping (updated) by Pstack (Scribe) on Oct 13, 2017 at 04:50 UTC
Thank you indeed. "....you're asking about the my %webpaths variable, and why your three functions in package DataBank can keep using it despite the execution of the file DataBank.pm having already finished?..." Yes, the persisting STATE of %webpaths perplexing me somewhat!. But I take your point about more persisting going on 'under the hood' than I generally pay attention to, and I am thankful indeed that mostly I have been spared the need to. I think my problem is just explicitness (which I tend to practise stringently in terms of style). I use 'our @EXPORT_OK = qw(subx suby subz)' a lot, of course, but in that very limiting way. And with 'my' can generally see exactly its scope within the same page (aka file) I am working on. Now I am wondering if the key factor for Perl retaining the state of %webpaths is due specifically to the precedence of calls in testme.pl to subs defined in DataBank.pm, and will therefore persist while testme.pl stays alive, even though other modules which altered the state of %webpaths are themselves long dead? (That at least would satisfy my need for order!) I am most grateful of the time and effort you put into an erudite explanation, which I really will have to study in some depth (and perhaps open my eyes to possible trajectories I have hjtherto been too nervous to explore). Thanks again, haukex!	[reply]
Re^3: modular file scoping by haukex (Archbishop) on Oct 13, 2017 at 08:17 UTC
Now I am wondering if the key factor for Perl retaining the state of %webpaths is due specifically to the precedence of calls in testme.pl to subs defined in DataBank.pm, ... Yes, that's it (I assume you meant `s/precedence/presence/`). From Persistent variables with closures: Unlike local variables in C or C++, Perl's lexical variables don't necessarily get recycled just because their scope has exited. If something more permanent is still aware of the lexical, it will stick around. So long as something else references a lexical, that lexical won't be freed--which is as it should be. You wouldn't want memory being free until you were done using it, or kept around once you were done. Automatic garbage collection takes care of this for you. ... If declared at the outermost scope (the file scope), then lexicals work somewhat like C's file statics. They are available to all functions in that same file declared below them, but are inaccessible from outside that file. This strategy is sometimes used in modules to create private variables that the whole module can see. That "something more permanent" are the `sub`s, and I'll get more into why those are "more permanent" below. ... and will therefore persist while testme.pl stays alive, even though other modules which altered the state of %webpaths are themselves long dead? Well, obviously it's a bit more complicated than "dead" and "alive" :-) Especially in dynamic languages like Perl, where the lines between the traditional "compile time" and "run time" can be blurred - you can run code at compile time with BEGIN and use, and compile code at runtime with eval, do, and require. In this case, consider what is going on when you write `use DataBank;` (or in your case `use Importer 'DataBank';`, which as I understand it is equivalent): During the compliation of `testme.pl`, when the compiler encounters the line `use DataBank;`, basically it will immediately compile and execute all of the code in `DataBank.pm` (plus the import/export code, the details of which I'll skip over for now), so when the line `use DataBank;` finishes compiling, the code in `DataBank.pm` will have finished executing (and all of its scopes have ended). Now the code in `DataBank.pm` includes statements like `sub set_webpath { ... }`. The important thing to keep in mind is that this does not actually run the code inside the `sub`, all it does is install that code into the symbol table under the names `&DataBank::set_webpath` and via the import/export mechanism also as `&main::set_webpath`, so that it can be run later, when other code says `set_webpath(...)`. So I hope it's obvious that the `sub`s from `DataBank.pm` need to stick around until after it has finished compiling and executing, because otherwise `testme.pl` couldn't call those functions, and the entire concept of modules exporting functions would break down. And since those functions need the `%webpaths` variable, it makes sense to keep that around as well, as described above.	[reply] [d/l] [select]
Re^4: modular file scoping by Pstack (Scribe) on Oct 13, 2017 at 22:22 UTC
Re^5: modular file scoping by haukex (Archbishop) on Oct 14, 2017 at 07:29 UTC
Some notes below your chosen depth have not been shown here
Re^2: modular file scoping (updated) by sundialsvc4 (Abbot) on Oct 10, 2017 at 17:43 UTC
What an excellent and thorough explanation of this subject. Thanks so much for taking the time to write and share it.


Perl: the Markov chain saw
	PerlMonks