http://www.perlmonks.org?node_id=920783

pileofrogs has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks!

I've got a weird problem. I have a script that is losing two args from ARGV before the script finished compiling.

My best guess is that one of my own modules is shifting ARGV in a BEGIN block, but I cannot find it.

By putting lots of BEGIN { print join(', ',@ARGV)."\n" } blocks in there and in modules it loads I've narrowed it down somewhat.

The heavy lifting for this script is in a module I'm writing. Before I use my module, ARGV is fine. After I use it, ARGV is two items shorter. I put more of my ARGV printers in my module and ARGV is fine at the top of the file all the way to the bottom.

So, it looks like something is happening after the compiler finishes reading my module and before it gets back to reading the script that usees the module. Does anyone know what that might be?

Because I'm pretty sure that made no sense, I'll try to illustrate with an example:

Module.pm

package Module; BEGIN { print 'beginning my module'.join(', ',@ARGV)."\n" } # do lots of stuff BEGIN { print 'ending my module'.join(', ',@ARGV)."\n" } 1;

script.pl

#! /usr/bin/perl -w -T BEGIN: { print 'before using my module'.join(', ',@ARGV)."\n" } use lib qw(.); use Module; BEGIN { print 'after using my module'.join(', ',@ARGV)."\n" }

Run it...

# ./script.pl three two one before using my module three, two, one beginning my module three, two, one ending my module three, two, one after using my module one

Note this is not what you see if you run my example code. This is what you get if you run my actual code. My actual code is too huge an embarrasing to post.

Woah!I just ran my example code and the results look like:

$ ./script.pl beginning my module ending my module after using my module before using my module

Which means my understanding of how this stuff is ordered is truly whacked...

A BEGIN block is supposed to execute as soon as the compiler finishes reading it, right? How does the order from my example script come about then?

Hopefully this post includes enough rambling to alert the many of you who are smarter than me to a plausable error. I'm going to stop typing now.

Thanks!

--Pileofrogs

P.S. I am not crazy

Replies are listed 'Best First'.
Re: debugging during compile
by dave_the_m (Monsignor) on Aug 17, 2011 at 20:32 UTC
    Two observations: first you are getting the strange output order in your sample code because the first BEGIN in script.pl has a spurious colon after it, turning it into a label and delaying its execution. Second, in the offending module in your real code, try putting print statements (not BEGIN{print} statements) within the import() function of that module; that's the next thing executed after the body of of the used Module, before use returns. So in your Module.pm example, it would become
    package Module; BEGIN { print 'beginning my module '.join(', ',@ARGV)."\n" } # do lots of stuff BEGIN { print 'ending my module '.join(', ',@ARGV)."\n" } sub import { print 'starting module import '.join(', ',@ARGV)."\n"; # stuff pop @ARGV; print 'ending module import '.join(', ',@ARGV)."\n"; } 1;
    which produces the output
    $ ./script.pl a b c before using my module a, b, c beginning my module a, b, c ending my module a, b, c starting module import a, b, c ending module import a, b after using my module a, b

    Dave

Re: debugging during compile
by chromatic (Archbishop) on Aug 17, 2011 at 20:24 UTC

    BEGIN {} is a BEGIN block. BEGIN: {} is a block with an unfortunately chosen label.

Re: debugging during compile
by Perlbotics (Archbishop) on Aug 17, 2011 at 22:02 UTC

    You can avoid editing all those modules by adding a hook into @INC as documented in require:

    #!/usr/bin/perl -w -T # install a 'hook' into @INC / see: perldoc -f require BEGIN { unshift @INC, '.'; # replaces 'use lib(.);' unshift @INC, sub { printf "***** TRACE: use %-20s with ARGV=%s\n", $_[1], join(', ', @ARGV); return; }; } BEGIN { print 'before using my module: '.join(', ',@ARGV)."\n" } #use lib qw(.); # removed: would shift hook to 2nd position, disabling + tracing use Module; # modified: sub import { shift @ARGV; } use Module2; # just a copy of the unmodified Module.pm BEGIN { print 'after using my module: '.join(', ',@ARGV)."\n" } __END__ > ./script.pl a b c before using my module: a, b, c ***** TRACE: use Module.pm with ARGV=a, b, c beginning my module1: a, b, c ending my module1: a, b, c ***** TRACE: use Module2.pm with ARGV=b, c beginning my module2: b, c ending my module2: b, c after using my module: b, c

    If you have several modules, you could at least narrow the problem down to two candidates. Then, you can add debugging code to those modules. Hopefully, no other module fiddles with @INC at the same time... Good luck!

    Update: OK, here are sample modules Module.pm and Module2.pm:

    Update2: In case the approach described above does not work:

    BEGIN { *CORE::GLOBAL::require = sub { printf "===== TRACE: req %-20s with +ARGV=%s\n", $_[0], join(', ' ,@ARGV); CORE::require( $_[0] ); }; }

    Update3: Trivial, but might also help to narrow down candidate modules:
         find my/modules -name \*.pm -exec egrep -l @ARGV {} \;

Re: debugging during compile
by davido (Cardinal) on Aug 17, 2011 at 22:44 UTC

    I've considered this before, but never bothered to implement it. tie your array. Here's the idea: You have an array that something strange is happening to and you can't find where/when/why. You want to inspect it without altering your code with a bunch of print statements. Sure, there's the Perl debugger, and that's probably where you ought to turn first (and the main reason I haven't gotten around to giving this a try before). But today I'm going to try something different.

    Tie an array to a class that provides debug info whenever the array is modified in some way. Often tied entities are discouraged as they create action at a distance; it's very difficult to look at the primary source code and see why some behavior is happening -- too easy to forget about the tied behavior. But in this case, that's exactly what we're after: Behavior that is mostly invisible to the primary script, but helpful to us in some way.

    Start by using Tie::Array, and subclassing Tie::StdArray since it provides us with all the methods that emulate a plain old array. Then just override the ones we care about. It turns out we care about a lot of the methods (any that alter the array). But the source for Tie::Array (under the Tie::StdArray section) is a good starting point for maintaining default Array behavior while providing hooks for additional functionality.

    Note that with the "use parent" pragma, we have to say '-norequire', since Tie::StdArray is part of Tie::Array, and has already been loaded. We could just manipulate @ISA directly but that's not very Modern Perlish.

    I used Carp because its cluck() function is perfectly verbose. And I created an object method called $self->debug() that is a setter and getter for debug status. With debug set to '1' (or any true value) we get a noisy tied array. Set to zero, the array becomes silent again. Here's one fun aspect of a tied entity; we get the side-effect behavior (the primary objective of the tie is to create side-effects in our variables), and we also get the ability to control our variable's behavior via object oriented interface. If our tied variable is @array, and our tie returns an object to $o, then $o->debug(1) will turn on debugging verbosity for @array.

    What's it all for? Tie @ARGV in the beginning of your script (be sure to save its content and restore it after the tie). Then watch what other parts of your script do to @ARGV. Put the tie in a BEGIN{} block, and put that block before any 'use Module;' statements that may involve tinkering with @ARGV.

    Here's some somewhat messy code that demonstrates what I'm talking about:

    If you read the code you will also see that I'm creating a sort of inside-out object in tandom with the primary blessed array-ref. This is because Tie::Array and Tie::StdArray tie our array to an array-ref, which makes it simple to implement a tied array. But it doesn't help much if we need some name-space for additional storage per tied object. So by creating a hash called %STASH as a package global within the tie class module I create room for a namespace. I can create keys for the %STASH with names that are just the stringified version of the blessed object reference. And when the object is distroyed, I made sure to delete the key. There are probably better approaches nowadays, but it's been awhile since I played with such things, and that's how I remember doing it in the past.

    Your output should be:

    Now that you've seen this atrocity, be glad that Perl has a debugger. ;)


    Dave

Re: debugging during compile
by armstd (Friar) on Aug 18, 2011 at 06:26 UTC

    I've seen pretty good direct debugging suggestions already, so I'll go a bit off topic here...

    You've mentioned you have other BEGIN blocks already. What do your BEGIN blocks DO? And why? I'll also point out that code in a module that isn't confined to a sub in a module will execute at "compile time" for the caller, just after the BEGIN blocks in the current module. It's important to understand that Perl can (and usually does) have multiple "compile time" phases interspersed with "run time" phases.

    Do you have any code that isn't in BEGIN blocks or sub blocks in your modules? Do you realize that will also execute at your caller's "compile time" (actually at the module's run time, but prior to the 'use Module;' returns to the caller)? The standard "1;" we put at the bottom of a Module is executed in that module's run time, for instance.

    Generally I don't recommend doing anything at all at compile time other than:

    • declare package/code structure - package namespace, base class, exports, pragmas
    • declare package globals - 'my' wherever possible, 'our' if absolutely necessary, initialized undef
    • declare/initialize package global constants
    • declare subs
    Anything not on that list... is either extremely exceptional, or violates one of my design principles. Anything that involves doing anything beyond simple declaration... save it for runtime. Your methods will know when runtime is happening, because they won't get called at any other time. Do object initialization at first method call. Do package initialization at first object initialization. If no method is ever called, be glad you saved all that time and effort.

    So my big question is, what are your modules doing outside of that list of declarations? Does it really need to be done at "compile time"? I've seen cases where the designer had modular concrete subclasses "registering" themselves with their abstract class at compile time, as one example. Perl allows other ways to do that kind of thing though. Compile time in Perl is precious, and should not be abused. It delays response-time in interactive programs and CGIs. In my experience, the less that gets done at compile time in Perl, the better.

    Do any of your Modules use shift() or pop() outside of a sub? In the package global space, @ARGV can work for shift/pop like @_ does inside a sub, as a convenience. Here's an example of a badly behaved module:

    $ cat Module.pm package Module; our $global = shift(); # stole the first arg already... while( $arg = shift() ){ print "Module stealing arg: $arg\n"; }; 1; $ $ perl -e 'BEGIN { print "\nbefore using my module: ".join(", ",@ARGV) +."\n" } use Module; BEGIN { print "\nafter using my module: ".join(", + ",@ARGV)."\n" } print "Module::global: $Module::global\n";' one two +three before using my module: one, two, three Module stealing arg: two Module stealing arg: three after using my module: Module::global: one $

    I would recommend breaking down your modules. Start over, recreating them from the ground up, cut&pasting bit by bit. Create a main caller that only looks at @ARGV and compiles the modules like mine does above. Reconstruct the modules from scratch, Starting with just package structure (package declaration, base class, exports) and subs. See what happens. Add package global my/our declarations. See what happens. What's left, and why? Does it need to be done at compile time?

    --Dave

      Ladies and gentlemen, we have a winner!

      I hadn't realized the code in the module would be run after compile and before the calling script continued compiling.

      I put some print statements in the main body of my module and sure enough, ARGV was being altered in there. I tracked it down with a few more prints and found it.

      Somehow in refactoring a sub, I had left a

      my $self = shift; my $item = shift;

      sitting naked in the main body of the module. Whee!

Re: debugging during compile
by chrestomanci (Priest) on Aug 18, 2011 at 09:47 UTC

    If you favour the perl debugger, you can debug perl compile time statements by manually setting a breakpoint in a BEGIN block. If you add:

    BEGIN { $DB::single =1; }

    To the top of your perl script, and then run it with perl -d the debugger will stop there before your use statements and the module start-up code they run are evaluated.

      Awesome! I tried the debugger with no luck. I was looking for this, but I didn't understand it from the perldebug doc.

      Thanks!