|Perl: the Markov chain saw|
Recently, I've been thinking about a really, really minor perl issue: what's the best way to format your script's main routine? I'd also always wondered how you were supposed to unit-test the main routine in your script. I recently came up with an idea (inspired by a brian_d_foy article) that answers both questions for me.
When I first started coding, I just put everything in my main routine at top level-- in global scope. My scripts looked something like this (translated from perl 4):
The problem with that format is that any variable I have in the main routine, like $name above, is in scope for the whole file. If I'd misspelled $name in the argument list to say_hello(), it could have given rise to a hard-to-trace bug.
I've seen some people solve this problem by declaring a subroutine called main(), then calling it immediately afterward. This solves the immediate problem, but makes the code a little more confusing, at least to my eyes:
It works, of course, but something about it bothers me. It's easy to miss the call to main() when reading the code, and my eyes tend to skip over the subroutine when looking for the main routine. Worse, if the main() call gets deleted, the entire script will fail to run with no error or warning.
The style I adopted uses a named block to split the difference between subroutine and toplevel code, like so:
Here, I thought, was the One True Main Routine Style. My main routine is in a lexical block, but is automatically executed whenever I run the script. A friend of mine pointed out the issue with this code, though: I should call exit(0) afterward, to prevent any other code from being run:
OK, that works, but the extra statement in every script got me thinking. Could I create some syntactic sugar that makes it obvious where the main routine is, but means that I don't have to remember to type 'exit' in every script?
The answer was inspired by a brian_d_foy article, "Five Ways To Improve Your Perl Programming", in which he describes modulinos, which are programs declared as modules. The key here is that he uses the 'sub main' method, but combines it with a function call that only happens when the file is run as a script. That lets you still use the package as a library, or write unit tests for it. Still, the code was a little more complicated than I wanted to write every day:
Today, it occurred to me that I could write a module that provides some syntactic sugar for the perfect main routine. It would be simple and obvious, provide a lexical block, and exit afterward. In short, like this:
It might seem odd to make syntactic sugar that does so little, but this is the best solution I've come up with. The 'main' block will run if this file is called as a script. If it's brought into another script via require() or use(), the main routine won't run, but it will create a function called run_main() that will call the main routine with @ARGV set to its arguments. That way, I can write a test script like so:
Which would print:
So here's the code that provides the syntactic sugar. Honestly, I wouldn't be surprised if someone has already done this on CPAN, but a cursory search didn't show me anything.