Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

RFC: Sub::Auto - lazy loading revisited (now: AutoReloader)

by shmem (Chancellor)
on Feb 19, 2007 at 10:23 UTC ( #600805=perlmeditation: print w/replies, xml ) Need Help??

Suppose you have to implement a long running, business-critical application that takes a huge amount of code and a long startup time, which you absolutely can't restart. Either because downtime means loss of $$, or e.g. your phone is going to glow, or you're going to be flooded with emails from complaining customers; or think of something that is crucial in a permanent workflow. Think perlmonks ;-)

You know that the first release will be running forever, but you will have to dynamically update your application: your first release will be far from perfect, and requirements will evolve during that application's runtime, which you cannot forsee. In short, keep it running, but be able to update it.

As there certainly will be server downtimes due to maintainance, security fixes and upgrades of other packages into which your updates could be scheduled, this scenario may seem like a hypothetical one. But given a large load time of your app, you might as well have a framework that requires this kind of dynamic code change during development time; you may want to avoid e.g. killing fastcgi servers or re-loading apache. For the sake of making my point I'll just bless $hypothetical, $real. (Please take my apologies if what I'm meditating over has been presented elsewhere and beaten to death already.)

What are your choices?

IPC

You could break your app up into single processes. You would have one inmutable process which for doing whatever() queues requests, sends messages and receives responses to/from external processes via some IPC mechanism, which you could kill off and restart for upgrades. This adds complexity to your code for relative little gain.

source file timestamps

A good idea.

You could periodically check the time stamps of your source files in the main loop of your application, and throw away the %INC table entry of updated modules. Some reload modules, e.g. Apache2::Reload, do that. But doing so, you must be careful. You can't just throw away the symbol table, as your objects get orphaned. Only in very new perl releases the orphaned object bug is fixed: referencing orphaned objects doesn't cause a segmentation fault anymore, rather they get connected to the empty __ANON__ package. That doesn't help much, either, because the object's methods will be lost.

Packages may be too coarse grouping anyways. The granularity might not suffice. You might want to update functions and/or methods, not an entire package.

But the functions or methods of a package are in a single file. Are they? Let's see what we have at hand.

lazy loading

Perl has a well-known mechanism for lazy loading of functions, implemented via the package pair AutoSplit and AutoLoader. Splitting up a module into an immediate part and delayed code is easy: you put your functions suitable for autoloading after the __END__ token. After running AutoSplit on your Module, the immediate code part will reside in your Module.pm, while the auto-loadable functions will reside in auto/Module/ in per-function files, e.g. function.al.

If you use your module, only the immediate code part is compiled and run. Calls to autosplit functions will be handled via AutoLoader, whose AUTOLOAD block will require the function.al file the first time function($foo) is called, and replace its own stack frame with that of the just compiled function via the magic goto &function.

This may greatly reduce startup time, but other than that it doesn't give more flexibility. Once a function is loaded from an autosplit file, it is defined. The AUTOLOAD block will not be called again for that function, since now it can be found in the symbol table.

The subroutine or method "doesn't care for itself". If it was to do so, calling it would trigger some check subroutine that looks for some value particular to the function's source file (timestamp, size, ownership, flags, md5sum, ...) and decide from there whether it has to reload itself.

Having the check code outside of the sub means either a) shoehorning every call to a monitored subroutine through a dispatching sub that checks the disk files, or b) check asynchronously, e.g. with an alarm handler that periodically goes over all files.

introducing Sub::Auto

This module provides for lazy loading and reloading of monitored subroutines.
=head1 NAME Sub::Auto - Lazy loading and reloading of anonymous subroutines =head1 SYNOPSIS use Sub::Auto; my $sub = Sub::Auto -> new ($file, $checksub, $autoprefix); $result = $sub -> (@args); $sub -> check (0); # turn source file checking off for $ +sub $sub -> checksub ($coderef); # provide alternative checking routin +e use Sub::Auto qw (AUTOLOAD); Sub::Auto -> check (1); # turn source file checking on $result = somefunc (@args); *somefunc{CODE}->check(0); # turn off checking for this named su +b =head1 DESCRIPTION Sub::Auto provides lazy loading like AutoLoader, but also for function files which return an anonymous subroutine upon require (as its last evaluated statement). Before requiring that file, it is checked via some subroutine returnin +g a value (default is mtime). The returned value is remembered. At each call to that sub the check subroutine is run again if this subroutine' +s check flag is set; and if the returned value changed, the source file +is reloaded. Importing the AUTOLOAD method provides for lazy loading of anonsubs as + named subs. The wrapped anonsub will be assigned to a symbol table ent +ry named after the filename root of the function source file. =head1 METHODS =over 4 =item new ($file, $checksubref, $autoprefix) subroutine constructor. $file can be the path to some function file or a function name which will be expanded to $autoprefix/__PACKAGE__/$fun +ction.al and searched for in @INC. $checksubref and $autoprefix are optional. If they are not provided, the default class settings are used. =item auto ($autoprefix) set or get the default autoprefix. Default is 'auto', just as with Aut +oLoader: for e.g. POSIX::rand the source file would be auto/POSIX/rand.al . Sub +::Auto lets you replace the 'auto' part of the path with something else. Clas +s method (for now). =item suffix ($suffix) set or get the suffix of your autoloaded files (e.g. '.al', '.pl', '.t +mpl') as a package variable. =item check (1) set or get the check flag. Turn checking on by setting this to some tr +ue value. Default is off. Class and object method, i.e. Sub::Auto->check(1) sets + the default to on, $sub->check(1) sets checking for a subroutine. For now, + there's no way to inculcate the class default on subs with a private check fla +g. =item checksub ($coderef) set the checking subroutine. Class and object method. This subroutine +will be invoked with a subroutines source filename (full path) every time the +sub for which it is configured - but only if check for that subroutine is true + -, and should return some value special to that file. Default is 'sub { (stat $_[0]) [9] }', i.e. mtime. =back =head1 SEE ALSO AutoLoader, AutoSplit, autouse, DBIx::VersionedSubs =head1 TODO =over 4 =item eliminate paranoia make this module truly subclassable. Turn lexical private subs into ou +r() vars or into named subs. Make the %AL hash accessible. All that means re-th +ink code calling semantics and uses of __PACKAGE__ . =item provide for more path changes and access methods of subroutines The 'auto' part of a subroutine should be changeable, as well as the f +ull path to a subroutine source file. Then, a subroutine's access method should + be made more flexible, e.g. reading code from some database, retrieve via LWP, + or else. =back =head1 BUGS Sub::Auto subroutines are always reported as __ANON__ (e.g. with Carp: +:cluck), even if they are assigned to a symbol table entry. Which might not be +a bug. There might be others. =head1 Author shmem <gm@cruft.de> =head1 COPYRIGHT Copyright 2007 by shmem <gm@cruft.de> This program is free software; you can redistribute it and/or modify i +t under the same terms as Perl itself. =cut package Sub::Auto; use Exporter qw(import); use strict; use warnings; use Scalar::Util; use File::Spec; our $VERSION = 0.01; our @EXPORT_OK = qw (AUTOLOAD); my $Debug = 0; our ($gensub, $load); our %AL; # hash holding all info about subs sub new { my $class = shift; my $caller = caller; my $sub = $gensub -> ($caller,@_); bless $sub, $class; } sub auto { shift if __PACKAGE__ || $_[0] eq (caller(0))[0]; $AL {'auto'} = shift if @_; $AL {'auto'}; } sub check { my $self = shift; if(ref($self)) { ${ $AL {Sub} -> {Scalar::Util::refaddr($self)} -> {'check'} } += shift; } else { $AL {'check'} = shift; } } sub checksub { my $self = shift; if(ref($self)) { ${ $AL{Sub} -> {Scalar::Util::refaddr($self)} -> {'checksub'} +} = shift; } else { $AL {'checksub'} = shift; } } sub suffix { shift if __PACKAGE__ || $_[0] eq (caller(0))[0]; $AL {'suffix'} = shift if @_; $AL {'suffix'}; } checksub ( __PACKAGE__, sub { (stat $_[0]) [9] } ); # default check su +broutine check ( __PACKAGE__, 0); # default is not c +hecking # $gensub - returns an anonymous subroutine. # Parameters: # if one: filename (full path) # if more: package, filename [, checkfuncref [, auto ]] $gensub = sub { my $package = scalar(@_) == 1 ? caller : shift; my $file = shift; my $chkfunc = shift || $AL {'checksub'}; my $auto = shift || $AL {'auto'} || 'auto'; my $function; { ($function = pop (@{[ File::Spec->splitpath($file) ]}) ) =~ s/ +\..*//; $file .= $AL {'suffix'} || '.al' unless $file =~ /\.\w+$/; unless (-e $file) { my ($filename, $seen); { $filename = File::Spec -> catfile ($auto, $package, $f +ile); foreach my $d ('.',@INC) { # check current working dir + first my $f = File::Spec -> catfile ($d,$filename); #warn "checking for $f\n"; if (-e $f) { $file = $f; #warn "got it! $file\n"; last; } } # redo the search with a truncated filename last if $seen; unless (-e $file) { $file =~ s/(\w{12,})(\.\w+)$/substr($1,0,11).$2/e; $seen++; redo; } } die "Can't locate function file '$filename' for package '$pa +ckage'\n" unless -e $file; } } if (my $addr = $AL {'Inc'} -> {"$package\::$function"} ) { return $AL {'Sub'} -> {$addr} -> {'outer'}; } else { # file not known yet my $inner; my $h = {}; my $cr = $chkfunc -> ($file); my $subname = "$package\::$function"; $h = { file => $file, check => \$AL {'check'}, checksub => \$chkfunc, checkref => \$cr, function => $subname, }; my $outer = $load -> ($package, $file, $h) or die $@; my $outeraddr = Scalar::Util::refaddr ($outer); $h -> {'outer'} = $outer; Scalar::Util::weaken ($h -> {outer}); $AL{Sub} -> {$outeraddr} = $h; $AL{Inc} -> {$subname} = $outeraddr; return bless $outer, __PACKAGE__; } }; $load = sub { my ($package, $file, $h) = @_; delete $INC {$file}; my $ref = eval "package $package; require '$file'"; # warn $@ if $@; return undef if $@; { # just in case the require dinn' return a ref - # then it's likely a named subroutine has been loaded # see chromatics note below # UNIVERSAL::isa($ref,'CODE') or $ref = \&{$h -> {'function'}}; Scalar::Util::reftype($ref) and Scalar::Util::reftype($ref) eq + 'CODE' or $ref = \&{$h -> {'function'}}; ${$h->{inner}} = $ref; my $sub = sub { my $cr = $h -> {'checkref'}; if( ${ $h -> {'check'} } and ${ $h-> {'checksub'} } and ( my $c = ${ $h->{checksub} } -> ($file) ) != $$cr) { warn "reloading $file" if $Debug; $$cr = $c; $load -> ($package, $file, $h); } goto ${ $h -> {inner} }; }; } }; sub DESTROY { my $outeraddr = Scalar::Util::refaddr ($_[0]); my $h = $AL {Sub} -> {$outeraddr}; delete $AL {Inc} -> { $h -> {function}}; delete $AL {Sub} -> {$outeraddr}; } sub AUTOLOAD { no strict; my $sub = $AUTOLOAD; my ($pkg, $func, $filename); { ($pkg, $func) = ($sub =~ /(.*)::([^:]+)$/); $pkg = File::Spec -> catdir (split /::/, $pkg); } my $save = $@; local $!; # Do not munge the value. my $ref; eval { local $SIG{__DIE__}; $ref = $gensub -> ($pkg, $func, '', $AL{'auto'} || 'auto'); }; if ($@) { if (substr ($sub,-9) eq '::DESTROY') { no strict 'refs'; *$sub = sub {}; $@ = undef; } if ($@){ my $error = $@; require Carp; Carp::croak($error); } } $@ = $save; no warnings 'redefine'; *$AUTOLOAD = $ref; goto $ref; } sub unimport { my $callpkg = caller; no strict 'refs'; my $symname = $callpkg . '::AUTOLOAD'; undef *{ $symname } if \&{ $symname } == \&AUTOLOAD; *{ $symname } = \&{ $symname }; } 1; __END__

If used as

use Sub::Auto qw(AUTOLOAD);
it is a drop-in replacement for AutoLoader (it handles named subroutines also) - with two caveats:
  • it doesn't look for a package's autosplit file to pre-define subroutines for the caller upon import() execution
  • references to a pre-declared named sub change after loading the respective autoload function file

<update>

As for now, it works (for me :-), but some AutoLoader tests fail when run against this module:

  • autoload function files with truncated file names fail to load
  • currently no unimport (no Sub::Auto; not implemented)

</update>

With a few changes and enhancements, AutoLoader could do the job. All that would be necessary is

  • extend AutoLoader's import to take a hash reference resembling %AL
  • make it use a $gensub routine if provided, otherwise use its standard require
  • check for the returned value from loading to choose the right form of goto

performance

Since the payload subroutines are wrapped into references which in turn are looked up from a hash by code that performs more checks and is wrapped into another sub (the outer sub, visible to the caller) there's significant overhead. You will not want to use this module to wrap tiny recursive functions where just calling them takes up much of the overall time they spend.

at the end

Do you find this module useful? Does it have the right name? should I release it to CPAN? Or should I write a patch for AutoLoader / AutoSplit?

update: uploaded to CPAN as AutoReloader.

Comments, critiques and enhancement suggestions welcome.

update: replaced UNIVERSAL::isa with Scalar::Util::reftype as chromatic suggested, added autouse to the SEE ALSO section as per diotalevi's comment.

update: added unimport, fixed file searching, added proper die() statements - runs again the AutoLoader test file, failing only test 4 (can() returns ref to regular installed sub) - see above.

update: added AutoReloader for searchability, since that's its name on CPAN

--shmem

_($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                              /\_¯/(q    /
----------------------------  \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Replies are listed 'Best First'.
Re: RFC: Sub::Auto - lazy loading revisited
by diotalevi (Canon) on Feb 19, 2007 at 17:14 UTC

    Your name needs work. If this is a reloading AutoLoader then a name like AutoReloader might fit the bill better. Sub::Auto sounds like its confused about what its purpose in life is. Also, you didn't mention autouse which is also a nice way to get where you're going. It doesn't do any reloading though. Were you to post a reloading autouse it'd be good to call it autoreuse or similar.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      Thanks for the suggestion. tye suggested in the CB to put it under the Devel namespace, but that would surely implicitly forbid the use in a productive environment - where this module might be useful, too.

      I think I'll go with just AutoReloader, ++diotalevi.

      autouse is for modules, while this one is for module methods or functions; I've included the reference.

      A working autoreuse for modules that use AutoReloader is a challenge... as is autousing AutoReloader. Will check that.

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

        Devel:: doesn't mean things that are only for using during development. It's just a grab-bag of tools most of which operate directly on the language.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: RFC: Sub::Auto - lazy loading revisited
by chromatic (Archbishop) on Feb 20, 2007 at 05:47 UTC

    Careful! This is a lot stricter than you may want it to be:

    UNIVERSAL::isa($ref,'CODE') or $ref = \&{$h -> {'function'}};

    Setting aside the fact that UNIVERSAL::isa() is a method and not a function, you're disallowing blessed coderefs (perhaps unnecessarily) and you're forbidding overloading (definitely unnecessarily).

    If you really need to check that the underlying reference is to a subroutine, use Scalar::Util's reftype(). If all you want to do is check that you can use something as a subroutine reference appropriately, use:

    if ( eval { defined &$ref } ) { ... };
      Thanks, Scalar::Util's reftype is the way to go. The sub require returns could be blessed, indeed - but overloaded? hmm...

      Or do you mean, failure if UNIVERSAL::isa is overloaded?

      <update>

      Still musing whether to patch AutoLoader...

      </update>

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      exists &$ref, not defined &$ref:
      $ perl -we'sub AUTOLOAD {1} sub foo; print main->foo, 0+exists&foo, 0+ +defined&foo' 110

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://600805]
Approved by planetscape
Front-paged by liverpole
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (7)
As of 2018-11-21 07:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My code is most likely broken because:
















    Results (237 votes). Check out past polls.

    Notices?