Problems? Is your data what you think it is? | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
This tutorial describes how to deal with information passed in on the command line. Not the switches passed to the Perl interpreter, but the switches and file names that are passed to your script. The first stepsLet's assume that you have written your first hello world script, which looks something like this:
Instead of greeting the world, you would like to greet whatever is passed in on the command line. To do that, you should know that the command-line arguments are stored in the @ARGV array. To get at the first element, we could index it using $ARGV[0]. The usual idiom, however, is to remove the element from the array and then deal with it and/or throw it way. shift is used to get at the first element of an array, thus we would write something like:
As it turns out, Perl offers a convenient short-cut. If you use shift outside a subroutine context with no parameters, it will implicitly use @ARGV, so the above can be rewritten as:
This is will be familiar to people used to Unix shell programming. This is all well and good; the script's behaviour is controlled by the parameters appearing on the command line. There is a problem, however, in that now a parameter must be supplied, for if it is omitted, the script will cough up a Use of uninitialized value in concatenation error, and all that will be printed is "Hello, ". Using default valuesIt would be nice to be able to provide the script with a sensible default value, so that should no parameter be supplied, it will be able to continue and do something reasonable. For this we can use the || operator:
What this does is assign $thing the value of the first parameter on the command line or 'world', should the command line be empty. Of course, sometimes an empty command line is not reasonable, in which case the best thing is to stop the script and print out a message so that the user can take corrective action:
Note that the correct idioms are to say shift || $value but shift or die. Read up on Perl's precedence rules to understand why. There's another gotcha to be aware of, if ever 0 is a valid value to pass in on the command line:
... will not do the right thing if you pass 0 to the script. It boils down to what Perl considers truth. It just so happens that 0 is treated as false, so the left hand side of the || operator as a whole is false, and thus $thing winds up being assigned the value of 'default'. There is a simple two step process way around this. First assign what comes out of shift. Then, depending on whether $thing is defined (not whether it is true or false, thus side-stepping the issue), use the wonderful-but-cryptic ||= to possibly assign to $thing, based on the outcome of the conditional.
Introducing comand line switchesNow supposing we wanted to modify the script to say "Goodbye" according to whether the -g switch was used. We would like to say things like:
The above code is difficult to understand. It does, however, work according to spec. The main problem is that it will fail to consider -t or - anything as a switch, and complain that the switch has no effect on the program. For example, consider what greet -h harry will print out. Even worse, the code will become horribly obfuscated should the script have to deal with two, three or more command line switches. Obviously, a better approach is called required. Above all, it would be nice not to have to write it oneself, but rather use something that exists already. That must mean that packages exist to do what we need. What we are looking for is something that will look for switch-like instances on the command line, set some corresponding Perl variables and above all remove them from @ARGV so that we don't have to bother with them. What can perl offer?Note the distinction between Perl the language and perl the interpreter. As it turns out perl, the Perl interpreter can do some rudimentary command line processing all by itself. Sometimes this is sufficient. All you have to do is feed the interpreter the -s switch:
<aside>Do not get confused by perl's switches and your script's switches. Remember that with a shebang line of #! /usr/local/perl -sw and the switches -xy, the shell actually runs /usr/local/bin/perl -sw script -xy. Perl sees -sw script -xy. It processes the -sw, sees that 'script' looks like a file name and opens it and starts interpreting. Your script only sees -xy (although to a certain extent it can detect what switches were passed to perl, such as by reading the value of $^W).</aside> Now we have a much smaller script that should be easier to understand. There is, however, a small problem due to interactions with use strict pragma. The -s functionality harks back to before the age of lexical variables. It refers to package variables that have to be explicitly declared in a use vars pragma when strict is in use. This is not really a problem, except that if the script is run with the -h switch and warnings are switched on, the program will complete but it will spit out a Name "main::h" used only once: possible typo. warning message. Before turning away from -s as a viable solution, consider the other feature that Perl provides. If the above script is run with -g, the package variable $g is set to 1. Alternatively, the script could be run with -g=foo, in which case instead of being set to 1, $g would contain 'foo'. Sometimes this limited functionality is enough to get the job done, and the fact that you don't have to drag around an external package file can be a win in certain circumstances. <update date="2001/11/15"> It appears that -s has some rather nasty side effects, which means that scripts that use it should only be used in safely controlled environments (if such a thing exists). For more information, read the thread "perl -s is evil?".</update> getopt: the heavy artilleryMore Unix culture: the traditional way to parse command line arguments in C was through a library call named getopt or getopts, short for get options. This has been carried over to Perl in the form of Getopt::Std and Getopt::Long which are bundled in the core distribution. Getopt::StdGetopt::Std performs command line processing and pulls out anything that resembles a -letter switch and its value, leaving the remaining values in @ARGV. It offers two interfaces, getopt and getopts. You almost always want to use the second variant. Let's see why:
Before going any further, the first thing to point out is that Getopt::Std has been retrofitted to get around the uncomfortable use of package variables. If you pass a reference to a hash as the second parameter to the getopt call, it will populate the hash, instead of using package variables, which allows the script to be rewritten as:
This script will silently ignore a non-specified switch, which is usually A Good Thing. There is, however, a serious bug lurking in this code. Try to get the script to print "Goodbye, foo". It's rather difficult to do because getopt is greedy. When it sees a specified switch, it tries hard to assign that switch a meaningful value, which means either the characters following the switch (as in -gparam) or the next parameter on the command line (as in -g param). Which means if you run the above script as script -g foo, $arg{g} will contain 'foo', but there will be nothing left on the command line, so $thing will be assigned the default value of 'world'. In order to get around this "feature", the second interface, via getopts should be used instead. In this case, the specification string ('g' in the above) is interpreted differently. By default, all letters specify boolean parameters. To force a parameter to pick up a value (i.e. to get the behaviour we so much wanted to avoid above), a ':' (colon) is appended. Therefore, to make -g greedy, it should be specified as 'g:'. This means that all we have to do in the above script is to call getopts instead of getopt and the job is done. If you want to look at a real-life example of code that uses Getopt::Std, you can look at a script I uploaded here named pinger, a little tool designed to scan a range of IP addresses via ping. Getopt::LongThat is all well and good, but what happens when you reimplement tar in Perl? How do you remember what all those pesky single character switches do in the string -cznTfoo? It's much easy to understand what's going on with --create --gzip --norecurse --files-from foo instead. Enter Getopt::Long. This module lets you build up a specification that adheres to the POSIX syntax for command line options, which generally introduces switches with the double-dash notation. Unfortunately, this precludes the use of single-dash switches (bikeNomad points out that this is not true. My bad for not paying closer attention to the documentation). Even worse, you cannot include both Getopt::Std and Getopt::Long in the same program, as they will fight over @ARGV and the results will be... undefined. Since I originally wrote this tutorial, I have used Getopt::Long a bit more (figured that I had to since I wrote this). Once you understand Getopt::Simple, Getopt::Long is pretty easy to pick up, and has much sophistication to offer, once you scratch below the surface. That said, all of the processing goes on behind the scenes. You can attach a callback to deal with the processing of individual options, but this can become unwieldy. Sometimes you need more fine-grained control of the parsing of the switches, as they come in one by one. While the following module is no longer being actively developed, it is just what you need in some instances, because it deals with parsing options only, and lets you deal with the rest. It turns the parsing inside out, and lets you act on options on the fly, and just therefore feels more cooperative. Try it, you might like it. Getopt::MixedThis module should cover all your command line processing needs. It's quite simple to set up. First of all you need to call init with a format string (akin to pack and unpack). The sets up what command line switches are defined, and what values they can take on. Here's a real life example hoisted from some code I have lying around:
This encodes the following information:
Pretty straightforward stuff. The next step is to call nextOption repeatedly until it fails. Once that is done, you have processed all the switches. Unlike Getopt::Std you set your defaults beforehand. If the switch isn't specified, the value isn't touched. Also note that just because a switch has a mandatory argument doesn't mean that the script will abort if the switch doesn't appear on the command line... it's not the switch itself that is mandatory. If this is required then you test the corresponding variable after the loop and if its value is undefined then you yank the rug out from under the script. The processing loop looks something like this:
The module is smart enough to recognise
as all being valid syntaxes for assigning foo to the -j switch. Remember the last variant. It's the easiest way of passing in a negative number on the command line. After all, how should --offset -30 be interpreted? Another real-life example of code, this time using Getopt::Mixed can be found at nugid, a script I wrote to manage large scale modifications of uids and gids of Unix filesystems. Where to from hereThis should be enough for 95% of your basic command line processing needs. But everyone has a different itch to scratch, and you should be aware that there is a boatload of getoptish packages hanging out on CPAN, as a search will reveal. Once you have the hang of a couple it's pretty simple to pick up another. The most sophisticated of all, Getopt::Declare comes, naturally enough, from the Damian. This module has an advanced method for specifying exactly what are the legal values that a switch may take, as well as providing poddish descriptions so that you don't have to write sub usage { ... } that explains how to use the program correctly. Switch name idiomsOver the years, a number of conventions have arisen over the best letters to assign to common operations that crop up again and again in program design. This list attempts to codify existing practices (updates welcomed). Use these conventions and people will find your programs easy to learn.
And now you know all you need to know about command line processing. Have fun! update: Tip o' the hat to petral for pointing out the node on Getopt::Declare, -h and a better Damian link. Tip o' the hat to Albannach for reminding me about the "passing 0 on the command line" bugaboo, and to OeufMayo regarding passing negative numbers. In reply to Parsing your script's command line by grinder
|
|