Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Parsing your script's command line

by grinder (Bishop)
on Jun 14, 2001 at 02:04 UTC ( #88222=perltutorial: print w/ replies, xml ) Need Help??

This tutorial describes how to deal with information passed in on the command line. Not the switches passed to the Perl interpreter, but the switches and file names that are passed to your script.

The first steps

Let's assume that you have written your first hello world script, which looks something like this:

#! /usr/local/bin/perl -w print "Hello, world\n";

Instead of greeting the world, you would like to greet whatever is passed in on the command line. To do that, you should know that the command-line arguments are stored in the @ARGV array. To get at the first element, we could index it using $ARGV[0]. The usual idiom, however, is to remove the element from the array and then deal with it and/or throw it way. shift is used to get at the first element of an array, thus we would write something like:

#! /usr/local/bin/perl -w use strict; my $thing = shift(@ARGV); print "Hello, $thing\n";

As it turns out, Perl offers a convenient short-cut. If you use shift outside a subroutine context with no parameters, it will implicitly use @ARGV, so the above can be rewritten as:

#! /usr/local/bin/perl -w use strict; my $thing = shift; print "Hello, $thing\n";

This is will be familiar to people used to Unix shell programming. This is all well and good; the script's behaviour is controlled by the parameters appearing on the command line. There is a problem, however, in that now a parameter must be supplied, for if it is omitted, the script will cough up a Use of uninitialized value in concatenation error, and all that will be printed is "Hello, ".

Using default values

It would be nice to be able to provide the script with a sensible default value, so that should no parameter be supplied, it will be able to continue and do something reasonable. For this we can use the || operator:

#! /usr/local/bin/perl -w use strict; my $thing = shift || 'world'; print "Hello, $thing\n";

What this does is assign $thing the value of the first parameter on the command line or 'world', should the command line be empty. Of course, sometimes an empty command line is not reasonable, in which case the best thing is to stop the script and print out a message so that the user can take corrective action:

#! /usr/local/bin/perl -w use strict; my $thing = shift or die "Nothing specified on the command line.\n"; print "Hello, $thing\n";

Note that the correct idioms are to say shift || $value but shift or die. Read up on Perl's precedence rules to understand why.

There's another gotcha to be aware of, if ever 0 is a valid value to pass in on the command line:

my $thing = shift || 'default';

... will not do the right thing if you pass 0 to the script. It boils down to what Perl considers truth. It just so happens that 0 is treated as false, so the left hand side of the || operator as a whole is false, and thus $thing winds up being assigned the value of 'default'.

There is a simple two step process way around this. First assign what comes out of shift. Then, depending on whether $thing is defined (not whether it is true or false, thus side-stepping the issue), use the wonderful-but-cryptic ||= to possibly assign to $thing, based on the outcome of the conditional.

my $thing = shift; $thing ||= 'default' unless defined $thing;

Introducing comand line switches

Now supposing we wanted to modify the script to say "Goodbye" according to whether the -g switch was used. We would like to say things like:

  • greet
  • greet sailor
  • greet -g
  • greet -g 'cruel world'

Let us first consider The Wrong Way to do it:

#! /usr/local/bin/perl -w use strict; my( $switch, $thing ); $switch = shift; if( $switch and $switch eq "-g" ) { $thing = shift || 'world'; } else { $thing = $switch || shift || 'world'; $switch = undef if $switch; } print $switch ? 'Goodbye' : 'Hello', ", $thing\n";

The above code is difficult to understand. It does, however, work according to spec. The main problem is that it will fail to consider -t or - anything as a switch, and complain that the switch has no effect on the program. For example, consider what greet -h harry will print out. Even worse, the code will become horribly obfuscated should the script have to deal with two, three or more command line switches.

Obviously, a better approach is called required. Above all, it would be nice not to have to write it oneself, but rather use something that exists already. That must mean that packages exist to do what we need.

What we are looking for is something that will look for switch-like instances on the command line, set some corresponding Perl variables and above all remove them from @ARGV so that we don't have to bother with them.

What can perl offer?

Note the distinction between Perl the language and perl the interpreter. As it turns out perl, the Perl interpreter can do some rudimentary command line processing all by itself. Sometimes this is sufficient. All you have to do is feed the interpreter the -s switch:

#! /usr/local/bin/perl -sw use strict; use vars qw/$g/; my $thing = shift || 'world'; print $g ? 'Goodbye' : 'Hello', ", $thing\n";

<aside>Do not get confused by perl's switches and your script's switches. Remember that with a shebang line of #! /usr/local/perl -sw and the switches -xy, the shell actually runs /usr/local/bin/perl -sw script -xy. Perl sees -sw script -xy. It processes the -sw, sees that 'script' looks like a file name and opens it and starts interpreting. Your script only sees -xy (although to a certain extent it can detect what switches were passed to perl, such as by reading the value of $^W).</aside>

Now we have a much smaller script that should be easier to understand. There is, however, a small problem due to interactions with use strict pragma. The -s functionality harks back to before the age of lexical variables. It refers to package variables that have to be explicitly declared in a use vars pragma when strict is in use. This is not really a problem, except that if the script is run with the -h switch and warnings are switched on, the program will complete but it will spit out a Name "main::h" used only once: possible typo. warning message.

Before turning away from -s as a viable solution, consider the other feature that Perl provides. If the above script is run with -g, the package variable $g is set to 1. Alternatively, the script could be run with -g=foo, in which case instead of being set to 1, $g would contain 'foo'. Sometimes this limited functionality is enough to get the job done, and the fact that you don't have to drag around an external package file can be a win in certain circumstances.

<update date="2001/11/15"> It appears that -s has some rather nasty side effects, which means that scripts that use it should only be used in safely controlled environments (if such a thing exists). For more information, read the thread "perl -s is evil?".</update>

getopt: the heavy artillery

More Unix culture: the traditional way to parse command line arguments in C was through a library call named getopt or getopts, short for get options. This has been carried over to Perl in the form of Getopt::Std and Getopt::Long which are bundled in the core distribution.

Getopt::Std

Getopt::Std performs command line processing and pulls out anything that resembles a -letter switch and its value, leaving the remaining values in @ARGV. It offers two interfaces, getopt and getopts. You almost always want to use the second variant. Let's see why:

#! /usr/local/bin/perl -w use strict; use Getopt::Std; use vars qw/$opt_g/; getopt('g'); my $thing = shift || 'world'; print $opt_g ? 'Goodbye' : 'Hello', ", $thing\n";

Before going any further, the first thing to point out is that Getopt::Std has been retrofitted to get around the uncomfortable use of package variables. If you pass a reference to a hash as the second parameter to the getopt call, it will populate the hash, instead of using package variables, which allows the script to be rewritten as:

#! /usr/local/bin/perl -w use strict; use Getopt::Std; my %args; getopt('g', \%args); my $thing = shift || 'world'; print $args{g} ? 'Goodbye' : 'Hello', ", $thing\n";

This script will silently ignore a non-specified switch, which is usually A Good Thing. There is, however, a serious bug lurking in this code. Try to get the script to print "Goodbye, foo". It's rather difficult to do because getopt is greedy. When it sees a specified switch, it tries hard to assign that switch a meaningful value, which means either the characters following the switch (as in -gparam) or the next parameter on the command line (as in -g param). Which means if you run the above script as script -g foo, $arg{g} will contain 'foo', but there will be nothing left on the command line, so $thing will be assigned the default value of 'world'.

In order to get around this "feature", the second interface, via getopts should be used instead. In this case, the specification string ('g' in the above) is interpreted differently. By default, all letters specify boolean parameters. To force a parameter to pick up a value (i.e. to get the behaviour we so much wanted to avoid above), a ':' (colon) is appended. Therefore, to make -g greedy, it should be specified as 'g:'.

This means that all we have to do in the above script is to call getopts instead of getopt and the job is done.

If you want to look at a real-life example of code that uses Getopt::Std, you can look at a script I uploaded here named pinger, a little tool designed to scan a range of IP addresses via ping.

Getopt::Long

That is all well and good, but what happens when you reimplement tar in Perl? How do you remember what all those pesky single character switches do in the string -cznTfoo? It's much easy to understand what's going on with --create --gzip --norecurse --files-from foo instead. Enter Getopt::Long.

This module lets you build up a specification that adheres to the POSIX syntax for command line options, which generally introduces switches with the double-dash notation. Unfortunately, this precludes the use of single-dash switches (bikeNomad points out that this is not true. My bad for not paying closer attention to the documentation). Even worse, you cannot include both Getopt::Std and Getopt::Long in the same program, as they will fight over @ARGV and the results will be... undefined.

Since I originally wrote this tutorial, I have used Getopt::Long a bit more (figured that I had to since I wrote this). Once you understand Getopt::Simple, Getopt::Long is pretty easy to pick up, and has much sophistication to offer, once you scratch below the surface.

That said, all of the processing goes on behind the scenes. You can attach a callback to deal with the processing of individual options, but this can become unwieldy. Sometimes you need more fine-grained control of the parsing of the switches, as they come in one by one.

While the following module is no longer being actively developed, it is just what you need in some instances, because it deals with parsing options only, and lets you deal with the rest. It turns the parsing inside out, and lets you act on options on the fly, and just therefore feels more cooperative. Try it, you might like it.

Getopt::Mixed

This module should cover all your command line processing needs. It's quite simple to set up. First of all you need to call init with a format string (akin to pack and unpack). The sets up what command line switches are defined, and what values they can take on. Here's a real life example hoisted from some code I have lying around:

Getopt::Mixed::init( 'j=s l:s p=i s=s t=s logfile>l d>p date>p period> +p project> j type>t');

This encodes the following information:

  • -j, -s and -t take a mandatory string argument.
  • -l takes an optional string argument
  • -p takes a mandatory integer argument
  • --logfile is an alias for -l
  • -d, --date and --period are all aliases for -p
  • und so weiter.

Pretty straightforward stuff. The next step is to call nextOption repeatedly until it fails. Once that is done, you have processed all the switches. Unlike Getopt::Std you set your defaults beforehand. If the switch isn't specified, the value isn't touched. Also note that just because a switch has a mandatory argument doesn't mean that the script will abort if the switch doesn't appear on the command line... it's not the switch itself that is mandatory. If this is required then you test the corresponding variable after the loop and if its value is undefined then you yank the rug out from under the script.

The processing loop looks something like this:

while( my( $option, $value, $pretty ) = Getopt::Mixed::nextOption() +) { OPTION: { $option eq 'j' and do { $Project = $value; last OPTION; }; $option eq 'l' and do { $Logfile = $value if $value; last OPTION; }; # ... } } Getopt::Mixed::cleanup(); die "No project specified via -j.\n" unless $Project;

The module is smart enough to recognise

  • -j foo
  • -jfoo
  • -j=foo

as all being valid syntaxes for assigning foo to the -j switch. Remember the last variant. It's the easiest way of passing in a negative number on the command line. After all, how should --offset -30 be interpreted?

Another real-life example of code, this time using Getopt::Mixed can be found at nugid, a script I wrote to manage large scale modifications of uids and gids of Unix filesystems.

Where to from here

This should be enough for 95% of your basic command line processing needs. But everyone has a different itch to scratch, and you should be aware that there is a boatload of getoptish packages hanging out on CPAN, as a search will reveal. Once you have the hang of a couple it's pretty simple to pick up another.

The most sophisticated of all, Getopt::Declare comes, naturally enough, from the Damian. This module has an advanced method for specifying exactly what are the legal values that a switch may take, as well as providing poddish descriptions so that you don't have to write sub usage { ... } that explains how to use the program correctly.

Switch name idioms

Over the years, a number of conventions have arisen over the best letters to assign to common operations that crop up again and again in program design. This list attempts to codify existing practices (updates welcomed). Use these conventions and people will find your programs easy to learn.

-aProcess everything (all).
-dDebug mode. Print out lots of stuff.
-hHelp. Print out a brief summary of what the script does and what it expects.
-iInput file, or include file
-lName of logfile
-oName of output file
-qQuiet. Print out nothing.
-vVerbose. Print out lots of stuff.

And now you know all you need to know about command line processing. Have fun!


update: Tip o' the hat to petral for pointing out the node on Getopt::Declare, -h and a better Damian link. Tip o' the hat to Albannach for reminding me about the "passing 0 on the command line" bugaboo, and to OeufMayo regarding passing negative numbers.

Comment on Parsing your script's command line
Select or Download Code
Re: Parsing your script's command line
by bikeNomad (Priest) on Jun 14, 2001 at 06:35 UTC
    Nice tutorial. One minor disagreement, though: Getopt::Long is quite happy with single-character switches. If you want to bundle them (as in, '-vax' means the same as '-v -a -x'), you have to call:
    Getopt::Long::Configure ("bundling");
Re: Parsing your script's command line
by mugwumpjism (Hermit) on Jun 14, 2001 at 14:46 UTC

    You should also check to see if your option is listed in the GNU Coding Standards Option Table. Don't forget to support --help and --version!

    Other than that, nice intro.

      That statement only applies to GNU/Linux systems, or at least systems that use the GNU Binutils. Most commercial UNIX vendors don't have long option support in their Binutils. Sun and HP's versions of tar don't support long arguments, but FreeBSD and Linux support both forms of arguments.

      While the choice of which arguments your programs accept is your choice, many people choose to follow the standard their OS vendor uses. I think that the table showing only the short arguments is good because short arguments is more of a standard then GNU-Style arguments, but I think that including a link to GNU Coding Standards Option Table, with an explaination that these are used mainly on GNU/Linux or GNU Binutils systems would be useful.

      -xPhase

        Ah, but this is Perl, you're supposed to be thinking cross platform. And what is more cross platform than GNU?

        I'd recommend looking at the table, and using what's on there unless you have a good reason not to.

Re: Parsing your script's command line
by Not_a_Number (Parson) on Aug 14, 2003 at 18:07 UTC

    Very nice, ++

    However, I have a small question on this snippet:

    my $thing = shift; $thing ||= 'default' unless defined $thing;

    Surely the || is redundant here?

    my $thing = shift; $thing = 'default' unless defined $thing;

    This seems to do the job just as well?

    Or am I missing something?

    thx

    dave

      This is because $thing could be defined but zero and we don't want to overwrite a perfectly valid zero from the command line.

        Er...

        I'm afraid I don't understand. Under what circumstances does:

        $thing = 'default' unless defined $thing;

        not work for $thing == 0? I'm confused. Could you show me some code where the behaviour of the above differs from:

        $thing ||= 'default' unless defined $thing;

        Thanks in advance,

        dave

        This is because $thing could be defined but zero and we don't want to overwrite a perfectly valid zero from the command line.
        As Not_a_Number points out, the syntax EXPR unless defined $thing will do nothing at all * if $thing is defined, whether it's true or false. That is, if we are executing EXPR, then $thing is guaranteed undefined, hence false; so $thing ||= 'default' is guaranteed to be the same as $thing = 'default'.

        It seems reasonable to guess that what happened is that the coder originally had $thing ||= 'default' in some old code, discovered (as you mention) that it doesn't work when $thing is false-but-defined, and added the defined check without realising that it made ||= redundant.

        Of course, a mere five years later, we have the wonderful //= (C style Logical Defined Or) instead to save us this pain.

        UPDATE (the *'d statement above—sorry, I don't know how to do footnotes): On further thought, it's not quite true that EXPR unless defined $thing will do nothing if $thing is defined. Among what I suppose are many other subtle cases, if the unless is the last line in a subroutine, it'll make the subroutine return 1 when $thing is defined. For example, after

        our $a = 1; sub b { 0 unless defined $a } sub c {} my $b = b; my $c = c;
        we have that $b = 1 but $c is undefined.

Re: Parsing your script's command line
by Anonymous Monk on Mar 04, 2005 at 14:42 UTC
    Very nice tutorial. Helped me a lot. Exspecially i didn't saw a note in the CPAN GetOpt:Std docs that said "GetOps strips of all parameters from @ARGV and leaves the rest". But thats really important if you want to use <> to process some files specified on the command-line. So thanx for the work =) Greetings From Munich, Grand Apeiron

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perltutorial [id://88222]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (8)
As of 2014-08-01 05:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (256 votes), past polls