Main routines, unit tests, and sugar

Recently, I've been thinking about a really, really minor perl issue: what's the best way to format your script's main routine? I'd also always wondered how you were supposed to unit-test the main routine in your script. I recently came up with an idea (inspired by a brian_d_foy article) that answers both questions for me.

When I first started coding, I just put everything in my main routine at top level-- in global scope. My scripts looked something like this (translated from perl 4):

#!/usr/bin/env perl

use strict;
use warnings;

# Script variables
our $Foo = 'bar';

# Main routine
my ($name, $greeted) = @ARGV;

$name //= 'Horace!';
$greeted //= 'world';
say_hello($greeted);

# Subroutines
sub say_hello {
    my ($name) = @_;
    
    print "Hello $name\n";
}
[download]

The problem with that format is that any variable I have in the main routine, like $name above, is in scope for the whole file. If I'd misspelled $name in the argument list to say_hello(), it could have given rise to a hard-to-trace bug.

I've seen some people solve this problem by declaring a subroutine called main(), then calling it immediately afterward. This solves the immediate problem, but makes the code a little more confusing, at least to my eyes:

# Main routine
sub main {
    my ($name, $greeted) = @ARGV;
    $name //= 'Horace!';
    $greeted //= 'world';

    say_hello($greeted);
}

main();
[download]

It works, of course, but something about it bothers me. It's easy to miss the call to main() when reading the code, and my eyes tend to skip over the subroutine when looking for the main routine. Worse, if the main() call gets deleted, the entire script will fail to run with no error or warning.

The style I adopted uses a named block to split the difference between subroutine and toplevel code, like so:

# Main routine
MAIN: {
    my ($name, $greeted) = @ARGV;
    $name //= 'Horace!';
    $greeted //= 'world';

    say_hello($greeted);
}
[download]

Here, I thought, was the One True Main Routine Style. My main routine is in a lexical block, but is automatically executed whenever I run the script. A friend of mine pointed out the issue with this code, though: I should call exit(0) afterward, to prevent any other code from being run:

# Main routine
MAIN: {
    my ($name, $greeted) = @ARGV;
    $name //= 'Horace!';
    $greeted //= 'world';

    say_hello($greeted);

    exit(0);
}
[download]

OK, that works, but the extra statement in every script got me thinking. Could I create some syntactic sugar that makes it obvious where the main routine is, but means that I don't have to remember to type 'exit' in every script?

The answer was inspired by a brian_d_foy article, "Five Ways To Improve Your Perl Programming", in which he describes modulinos, which are programs declared as modules. The key here is that he uses the 'sub main' method, but combines it with a function call that only happens when the file is run as a script. That lets you still use the package as a library, or write unit tests for it. Still, the code was a little more complicated than I wanted to write every day:

package My::Routine;

# ...

# Main routine
sub main {
    my ($name, $greeted) = @ARGV;
    $name //= 'Horace!';
    $greeted //= 'world';

    say_hello($greeted);
}

# Run the main routine only when called as a script
__PACKAGE__->main() unless caller;
[download]

Today, it occurred to me that I could write a module that provides some syntactic sugar for the perfect main routine. It would be simple and obvious, provide a lexical block, and exit afterward. In short, like this:

#!/usr/bin/perl

use Devel::Main 'main';

# ...

# Main routine
main {
    my ($name, $greeted) = @ARGV;
    $greeted //= 'world';
    say_hello($greeted);
};
[download]

It might seem odd to make syntactic sugar that does so little, but this is the best solution I've come up with. The 'main' block will run if this file is called as a script. If it's brought into another script via require() or use(), the main routine won't run, but it will create a function called run_main() that will call the main routine with @ARGV set to its arguments. That way, I can write a test script like so:

#!/usr/bin/env perl
require 'script_with_main.pl';
print "Loaded script!\n";
run_main('Shakespeare!', 'perlmonks');
[download]

Which would print:

$ perl test_test_main.pl 
Loaded script!
Hello perlmonks
[download]

So here's the code that provides the syntactic sugar. Honestly, I wouldn't be surprised if someone has already done this on CPAN, but a cursory search didn't show me anything.

use strict;
use warnings;

# Devel::Main by stephen

package Devel::Main {

    # We use Sub::Exporter so you can import main with different names
    # with 'use Devel::Main 'main' => { -as => 'other' }
    use Sub::Exporter;
    Sub::Exporter::setup_exporter({ exports => [ qw/main/ ]});

    # Later versions will let you customize this
    our $Main_Sub_Name = 'run_main';

    sub main (&) {
        my ($main_sub) = @_;

        # If we're called from a script, run main and exit
        if ( !defined caller(1) ) {
            $main_sub->();
            exit(0);
        }
        # Otherwise, create a sub that turns its arguments into @ARGV
        else {
            no strict 'refs';
            my $package = caller;
            *{"${package}::$Main_Sub_Name"} = sub {
                local @ARGV = @_;
                return $main_sub->();
            };
            
            # Return 1 to make the script pass 'require'
            return 1;
        }
    }


};

1;
[download]

stephen

Comment on Main routines, unit tests, and sugar Select or Download Code

Replies are listed 'Best First'.
Re: Main routines, unit tests, and sugar by BrowserUk (Patriarch) on Jun 13, 2013 at 05:51 UTC
The problem with that format is that any variable I have in the main routine, like $name above, is in scope for the whole file. If I'd misspelled $name in the argument list to say_hello(), it could have given rise to a hard-to-trace bug. Put your subroutines first, at the top of the file, and the 'main' code at the bottom. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^2: Main routines, unit tests, and sugar by stephen (Priest) on Jun 13, 2013 at 05:57 UTC
That's another variant that I see often, and of course it works. I don't prefer it because I then need to scroll down to the bottom to understand the rest of the code. stephen	[reply]
Re^3: Main routines, unit tests, and sugar by BrowserUk (Patriarch) on Jun 13, 2013 at 06:07 UTC
Then I just hit ^END and I'm there. (Unless I have a large __DATA__ section, in which case ^F__DATA__ takes me there.) The main advantages of subroutines at the top is that it avoids accidental closures and the need for pre-declarations for prototype checking. But I realise that it is a personal preference thing. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re: Main routines, unit tests, and sugar by talexb (Chancellor) on Jun 13, 2013 at 19:34 UTC
As per BrowserUK, for scripts I put subroutines first, then the main routine at the bottom. Here's roughly how your script would look with my formatting: `#!/usr/bin/env perl use strict; use warnings; # Script variables our $Foo = 'bar'; # Subroutines sub say_hello { my ($name) = @_; print "Hello $name\n"; } { # Main routine my ($name, $greeted) = @ARGV; $name //= 'Horace!'; $greeted //= 'world'; say_hello($greeted); }` [download] I don't see any value in having a subroutine called `main` which is then called once after being defined. (In fact, I find it distinctly odd. Personal preference.) Alex / talexb / Toronto "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds	[reply] [d/l] [select]
Re: Main routines, unit tests, and sugar by dcmertens (Scribe) on Jun 14, 2013 at 18:21 UTC
This is a cool approach! Just goes to show, though: different strokes for different folks. Once I wrapped my head around how Perl's module handling worked, I started to move nearly all of my functions and class declarations into modules, keeping my scripts to really just be the main functionality.	[reply]
Re: Main routines, unit tests, and sugar by vsespb (Chaplain) on Jun 16, 2013 at 15:07 UTC
I prefer the following approach: Main .pm file is just a normal .pm file There is a tiny .pl script with executable permissions which call main() in main module This script is possibly autogenerated or uses FindBin to setup @INC Example is perlbrew	[reply]
Re^2: Main routines, unit tests, and sugar by BrowserUk (Patriarch) on Jun 16, 2013 at 15:16 UTC
Reminds me of the guy that painted a bicycle on the side of his Hummer in an attempt to seem green. :) With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: Main routines, unit tests, and sugar by vsespb (Chaplain) on Jun 16, 2013 at 15:32 UTC
Why is that? Any important code resides under one App::MyApp namespace, and not split across App::MyApp and myapp.pl Also different deploy methods can generate different binaries (some with hardcoded paths) Same used in Ruby ecosystems - binaries are autogenerate and are almost empty Also, if binary autogenerated it might not contain copyright/license notice	[reply]
Re^4: Main routines, unit tests, and sugar by BrowserUk (Patriarch) on Jun 16, 2013 at 16:50 UTC
Re^5: Main routines, unit tests, and sugar by vsespb (Chaplain) on Jun 16, 2013 at 17:22 UTC
Some notes below your chosen depth have not been shown here
Re: Main routines, unit tests, and sugar by lee_crites (Scribe) on Jun 17, 2013 at 18:41 UTC
I designed my own coding style to answer this issue for myself. Mine is the "main is the last code in the file" option, as already suggested by others. I upvoted this for one reason: seeing someone work through an issue and find an appropriate, personally meaningful, solution is always a good thing! Lee Crites lee@critesclan.com	[reply]
Re: Main routines, unit tests, and sugar by radiantmatrix (Parson) on Jun 17, 2013 at 21:47 UTC
I'm not entirely sure how `use Devel::Main 'main'` is better syntactic sugar than calling `main()`... though I object to the `main()` rather than an executable body in the first place. It seems like your main concern with just having your 'main' be the body of the script is the risk of conflicting variable names in different scopes causing readability or debugging issues. The easy way around this is to either: Abstract your subroutines into one or more modules. Switching files is a pretty good cue that you're in a different scope when you're reading listings. OR Use different variable naming conventions for the outer scope; for example, when I have scripts large enough for this to be an issue, all my package-scoped variables begin with a capital letter. I suggest that if your scripts are large enough for this sort of confusion to be likely, you're probably best abstracting bits of it away into modules anyhow. <–radiant.matrix–> Ramblings and references “A positive attitude may not solve all your problems, but it will annoy enough people to make it worth the effort.” � Herm Albright I haven't found a problem yet that can't be solved by a well-placed trebuchet	[reply] [d/l] [select]
Re: Main routines, unit tests, and sugar by perlfan (Vicar) on Jun 19, 2013 at 15:47 UTC
Personally, I really like brian's modulino approach, but I use it sparingly. Main methods are one of the things that LPW wanted to do away with, or make implied (as many other things are). Yes, the main() or however you do it will make it more familiar with those comfortable with compiled languages. However, it's not really useful unless you're going to be taking advantage of it for things like unit testing. Where brian's modulino approach shines for me are situations where I am writing a utility that generating a useful set of subroutines for a related set of utilities. With the modulino, I get the ability to run it as a utility or include it as a library in a related util. I get to create unit tests as well. So, I would highly recommend that approach in these situations, rather than just arbitrarily creating a main() method.	[reply]


Problems? Is your data what you think it is?
	PerlMonks