RFC: A Primer on Writing Portable Perl Programs

NOTE: I felt the need to start a meditation on this topic to collect ideas from the community, because:

A) I see a strong need for specific guidelines on how to write Perl for portability, as an alternative to hoping that "Perl will just do the right thing" (which ain't always so), and

B) Although I've been writing Perl programs for Unix (Linux, etc.) systems for nearly a decade, my non-Unix experience with Perl is very limited--so I'm eager to tap into the Wisdom of the Monastery for the benefit of improving this Meditation to the point where it becomes a Tutorial.

Patches welcome! 8-}

A Primer on Writing Portable Perl Programs

Tim Maher, Consultix

tim@TeachMePerl.com

Perl is rightfully famous for being Operating System (OS) “portable”. This means that the Perl language itself can run on a wide variety of OSs, which is certainly a nice feature. But more importantly, it also means that Perl programs written with OS-independence in mind can generally be transported from one OS to another, and successfully run there, without any changes—and that's a fantastic feature.

In this tutorial, you'll learn the basic principles for converting Perl scripts and one-liner commands written for Unix 1 systems into forms usable on other systems. Although the general portability issues we'll discuss apply to all OSs, we'll concentrate on Windows as our example of a non-Unix OS, because of its widespread availability.

On Windows, individual Perl scripts can be run by clicking their associated icons on the graphical display, but Perl one-liners need to be submitted to the Windows “shell” (cmd.exe). For simplicity, we'll restrict our focus to this shell environment for running both scripts and commands. It's accessed by clicking on Start, followed by Run, and then typing cmd, which causes a "DOS"-style terminal window to appear, with a typical prompt of C:\.

Specific instructions for running Perl programs on systems other than Unix and Windows (VMS, Mac, etc.) may be found in the reference documents cited in section 3, “Additional resources”.

Next we'll discuss the major techniques used to make programs OS portable.

1. Programming for OS independence

In order to reap the benefits of Perl's capacity for OS portability, you must avoid writing code that depends on OS-specific resources or conventions. We'll consider the two most important cases here, having to do with OS-specific commands and pathnames. Additional problem areas are discussed in the resources listed in section 3.

1.1 Avoiding OS-specific commands

Although Perl is a very clever language, it's understandably incapable of executing a Unix-specific statement like this one on non-Unix OSs:

! system "who | grep '$ENV{LOGNAME}' > /dev/null" or
warn 'HELP! I'm not here!';

What's non-portable about this statement? It requires its host OS 2 to have Unix-like who and grep commands, a LOGNAME environment variable, the > redirection symbol, and the /dev/null special device—not to mention command exit codes whose True/False values are opposite to Perl's (which accounts for the !).

But the above example is a rather extreme case; in practice, most of the statements in an average Perl program would likely run on other OSs without change, and many of the remainder could easily be rewritten for OS portability.

Consider this call to system, which is OS specific indeed, yet still amenable to rehabilitation:3

# Sample output from "date": Wed Mar 22 16:59:38 PST 2006 # Print date stripped of trailing "<SP>TIMEZONE<SP>YEAR"

! system "date | sed 's/ A-Z{3} 0-9{4}\$//'" or
warn "$0: command failed!";

There are two major strategies for avoiding OS-specific statements like this one in order to make your code OS portable. The first is to use OS-independent resources, such as features built-in to Perl or available from modules, in preference to OS-specific ones.

Considering the above example from this perspective, Perl's built-in localtime function could be used to generate a string that's very similar to date's output. The only difference is that it lacks the timezone information, which isn't an issue because it's being discarded anyway. In addition, all of sed's functionality (and more) can be obtained from the use of Perl's built-in substitution operator. These observations allow us to replace the original Unix pipeline with this native Perl code:

$date=localtime;
$date =~ s/ \d{4}$//; # delete "<SP>year"
print $date; # newline provided by -l invocation option

This produces output identical to that of the date | sed pipeline, but in an OS-portable manner. As an added bonus, the elimination of the Shell-level command pipeline obviates the need for handling its exit code in an OS-independent manner.

The second major strategy for writing OS-independent code is one you should strive to avoid using. This technique involves writing separate chunks of OS-specific code in different branches to handle the OS differences, using Perl's special host-OS-reporting variable, $^O, to select the appropriate branch for execution.

This approach leads to code that takes this form:4

if ($^O eq 'MSWin32') {
    # Do Windows-OS specific stuff
}
elsif ($^O eq 'darwin') {
    # Do MacOS/X specific stuff
elsif ($^O eq 'linux' or $^O eq 'solaris') {
    # Do Linux/Solaris specific stuff
else {
    warn "$0: WARNING: Program might not work on '$^O'\n";
}

For example, the following branches use OS-specific commands to display a long-listing of the file named in the first argument:

if ($^O eq 'MSWin32') {
    system "dir $ARGV[0]";
}
elsif ($^O =~ /ix$/) { # matches our Unix-like OSs: AIX & IRIX
    system "ls -l $ARGV[0]";
else {
    warn "$0: WARNING: Program might not work on '$^O'\n";
}

Techniques based on use of the built-in stat function could provide an OS-portable alternative to the use of these OS-specific commands. But even more conveniently, you could probably find an OS-portable CPAN module that has already solved this problem, and use its resources instead of your own (see http://search.cpan.org).

Another major source of portability problems is programmers making certain kinds of assumptions about other OSs that may be invalid. We'll see how to avoid making unnecessary assumptions about file-system related differences next.

1.2 Avoiding OS-specific pathnames

There are contexts where Perl expects to find pathnames, such as within @ARGV in programs using the n or p options, and as arguments to the stat function. In such contexts, Perl automatically converts slash separators in pathnames into backslashes—if that's appropriate for the host OS. This means you don't have to code separate branches of execution for different OSs (as shown earlier) just to handle that chore.

For cases where Perl can't know in advance that a pathname will be present, such as within the argument for the system function, it's your responsibility to arrange for the slash-to-backslash conversion—along with any other OS-required changes.

For programmer convenience, Perl provides a standard module to help you perform OS-specific pathname conversions, called File::Spec::Functions.

You also must avoid making unfounded assumptions about other OSs, such as whether a particular directory (e.g., /tmp) will necessarily exist there. File::Spec::Functions helps with this task too, by providing a tmpdir function that returns the name of the counterpart for /tmp on the host OS.

The additional resources cited in section 3 discuss many other important portability issues, along with specific recommendations for dealing with them.

Having covered some important theoretical concerns in "programming for OS portability", we'll now discuss some specific recommendations for making Unix-bred Perl programs portable to Windows systems.

2. Running Perl programs on Windows

We'll begin by discussing the basic techniques for running Perl one-liner commands and Perl scripts on Windows systems.

2.1 Running Perl one-liners on Windows

Many programmers find that the use of Perl one-liners increases their productivity greatly, and Windows users are no exception. However, the vast majority of the one-liners shown in most Perl books will not work if typed, for example, to a Windows cmd.exe shell. That's because the single quotes they use to convey the program code to the perl command are not recognized as quoting characters by that shell.

There are two ways to address this problem:

· convert the quoting techniques used in the one-liner for compatibility with the target OS's shell 5

· convert the one-liner to a script that runs on the target OS

The first solution requires modifying the command's quoting in an OS-dependent manner, while the second avoids the code-quoting issue altogether by enclosing the program code in a file, which will only be read by Perl.

Let's look at a specific example of reworking a command's quoting for compatibility with Windows 2000 or XP (which share the same shell). Consider the following one-liner:

perl -wl -e 'print "Crikey, what a little beauty!";'

This command won't run properly on the Windows systems mentioned; here's the error message from Perl:

Can't find string terminator "'" anywhere before EOF at -e line 1.

That message indicates that Perl did not receive the complete program as the argument for -e, which is a by-product of Windows not treating single quotes as quoting characters.

In trivial cases like this one, where either type of quotes will work around the Perl string, simply swapping the internal double quotes with the external single quotes can fix the problem. That's because double quotes, unlike single quotes, are recognized as quoting characters by Windows, permitting this reworked command to work as intended:

perl -wl -e "print 'Crikey, what a little beauty!';"

In cases where double quotes must be used within the Perl program itself, backslashing allows them to coexist with the shell-level outer double quotes:

perl -wl -e "print \"The arguments are: @ARGV\";"

Next will discuss techniques for making scripts more portable.

2.2 Running Perl scripts on Windows

Assuming a script has been written with OS portability in mind as described above, it needn't take much work to get it to run on a non-Unix system.

For instance, on a Windows machine that has a working Perl installation,6 invoking a script as:

C>\ perl myscript

should be sufficient to run it (assuming the shell is properly configured to know how to find perl).

But the more typical approach is to add a Perl-specific file extension to each Perl script, to allow Windows to invoke perl on it automatically when you type its name to the shell prompt:

C>\ myscript.pl

The association between the .pl extension and Perl (or the .plx extension) is generally created at the time Perl is installed, but if you need to set that up yourself, the instructions are provided in perldoc perlwin32.

When you run scripts using either technique shown above, any invocation options provided on the script's Unix-oriented shebang line will be recognized and put into effect by the perl command. For this reason, you should leave your shebang lines in place when you transfer scripts from Unix to other OSs, despite the fact that they won't be used to locate the perl command itself (as they do on Unix).

Although getting your scripts to execute on Windows should not be a problem, obtaining the benefits of certain services provided by the Unix shells, which are far more sophisticated than their Windows counterparts, may not be so easy.

For example, let's say you wanted to supply filename arguments to a script using “wildcard” characters, as in this Unix command:

$ myscript *.txt

Special techniques would have to be used to arrange for *.txt to be processed properly, as detailed in perldoc perlwin32 (search for “Wild.pm”).

3. Additional resources

For additional information on writing Perl programs with OS portability in mind, and for running Perl commands on non-Unix OSs, you may wish to consult these resources:7

perldoc perlport # General portability issues
perldoc perlwin32 # Windows-specific portability issues
perldoc perlos2 # OS2-specific portability issues
perldoc perlmac # Mac-specific portability issues
perldoc perlvms # VMS-specific portability issues
perldoc perlmacosx # Mac OS X-specific portability issues
perldoc File::Spec::Functions # Useful portability functions
perldoc File::Spec # Functions of File::Spec::Functions
perldoc File::Spec::Unix # Unix-specific pathname info
perldoc File::Spec::Win32 # Windows-specific pathname info
perldoc File::Spec::Mac # Mac-specific pathname information
perldoc File::Spec::OS2 # OS2-specific pathname information
perldoc File::Spec::VMS # VMS-specific pathname information
perldoc perlrun # Invocation options and shebang lines

1 For our purposes, the term “Unix” refers to actual UNIX systems as well as functionally similar OSs such as Linux and Mac OS/X’s FreeBSD.

2The "host" OS is the one that the program is running on.

3Because the $ character within the double quotes seems to be introducing a request for the interpolation of the variable $/, the $ needs to be backslashed to be treated as a literal character by Perl.

4See man perlport for the name strings that Perl uses for other OSs, such as “MSWin32” for 32-bit Microsoft Windows systems.

5A "target" OS is one on which the program is intended to run.

6At the time of this writing the Activestate corporation (see http://activestate.com) was still the undisputed vendor of choice for high quality and freely available versions

7When you're on a Unix system, you could also use the man command to access these documents; however, on another OS, which you're likely to be visiting when you refer to this page, perldoc would be the appropriate command to use.

* Tim Maher, CEO, Consultix | tim@consultix-inc.com *

Back to Meditations