|Perl: the Markov chain saw|
Unix shell versus Perlby eyepopslikeamosquito (Chancellor)
|on Feb 18, 2008 at 06:39 UTC||Need Help??|
At work, we currently run a motley mix of Perl, Unix shell and Windows .BAT scripts. I'm putting together a case that Perl should almost always be preferred to Unix shell (and DOS batch). Here's why.
Writing portable shell scripts is hard. Very hard. After all, you must run a variety of external commands to get the job done. Is each external command available on each of our platforms? And does it support the same command line options? Is the behaviour the same on each platform? (echo, awk/nawk/mawk/gawk, grep/egrep, find/xargs, and ps are some example commands that spring to mind whose behaviour varies between different Unix flavours). Many of these portability concerns disappear when you write a Perl script because most of the work is done, not by external commands, but by Perl internal functions and modules. It's easier to port a shell than a shell script. :-) Note that we port the same Perl version to all our Unix and Windows boxes, and so don't experience any nasty Perl version portability issues.
Shell scripts tend to be fragile. Running commands and scraping their output is an inherently fragile technique, breaking when the format of the command output changes. Moreover, unlike Perl, shell scripts are not compiled when the script loads, so syntax errors may well be lurking down unexercised code paths: Perl catches syntax errors at compile time; shell catches them at run time. Notice that syntax errors can appear later when the script is presented with different data: test $fred -eq 42, for example, fails with a syntax error at run time if $fred contains whitespace or is empty (BTW, to avoid that, this code should be written as test "$fred" -eq 42). Finally, shell scripts can easily break if the global environment changes, especially the PATH environment variable. Even if the value of PATH itself doesn't change, a change to any file on the PATH has the potential to break a shell script (I remember one such case where GNU tar was mistakenly installed in a directory ahead of the system tar on the PATH on one of our AIX boxes).
Shell scripts, being interpreted and often having to create new processes to run external commands to do work, tend to be slow.
Shell scripts tend to be insecure. Running an external command in a global environment is inherently less secure than calling an internal function. Which is why most Unices do not allow a shell script to be setuid. And shell doesn't have an equivalent of Perl's taint mode, to handle untrusted, potentially malicious, script data.
Error handling and reporting tends to be more robust in Perl. For example, $! set by a failing Perl built-in function provides more reliable and specific error detail than $? set by a failing external command (especially when the external command is undisciplined in setting $? and in writing useful, detailed and regular error messages to stderr). Moreover, the ease of using the common Perl "or die" idiom and the newer autodie pragma tends to make Perl scripts more robust in practice; for example, chdir($mydir) or die "error: chdir '$mydir': $!" is commonly seen in Perl scripts, yet I've rarely seen shell scripts similarly check cd somedir and fail fast if the cd fails.
As a programming language, shell is primitive. Shell does not have namespaces, modules, objects, inheritance, exception handling, complex data structures and other language facilities and tools (e.g. Perl::Critic, Perl::Tidy, Devel::Cover, Devel::NYTProf, Pod::Coverage, ...) needed to support programming in the large.
Finally, shell has fewer reusable libraries available. Shell has nothing comparable to Perl's CPAN.
Even if the script is small, don't write it in shell unless you're really sure that it will remain small. Small scripts have a way of growing into larger ones, and you don't want the unproductive chore of converting thousands of lines of working shell script to Perl, risking breaking a functioning system in the process. Avoid that future pain by writing it in Perl to begin with.
Notice that the above is arguing against Unix shell scripts, not specifically for Perl. Indeed, the same essential arguments apply equally to Python and Ruby as they do to Perl. However, I view Perl, Python and Ruby as essentially equivalent and, given our existing significant investment in Perl, don't see a strong business case for switching languages. I see an even weaker business case for using more than one of Perl/Python/Ruby because that dilutes the company's already overstretched skill base.
I'm further trying to come up with a checklist of when it is ok for work folks to write their scripts in Unix shell:
I might add that I personally enjoy Unix shell scripting and have written a lot of Unix shell scripts over the years. It's just that I feel the argument for Perl over shell is overwhelming. Feedback on the above is welcome.
Updated 23-feb-2008: Improved wording. Mentioned Perl's taint mode. Clarified Perl (compiled) v shell (interpreted). 26-feb-2008: Added References section. 30-sep-2008: Added libraries (CPAN) paragraph. 4-oct-2010: Added error handling paragraph.