Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

The problem of "the" default shell

by afoken (Abbot)
on Dec 09, 2017 at 13:17 UTC ( #1205217=perlmeditation: print w/replies, xml ) Need Help??

I've got a little bit tired of searching my "avoid the default shell" postings over and over again, so I wrote this meditation to sum it up.

What is wrong with the default shell?

In an ideal world, nothing. The default shell /bin/sh would have a consistent, well-defined behaviour across all platforms, including quoting and escaping rules. It would be quite easy and unproblematic to use.

But this is the real world. Different platforms have different default shells, and they change the default shell over time. Also, shell behaviour changed over time. Remember that the Unix family of operating systems has evolved since the 1970s, and of course, this includes the shells. Have a look at "Various system shells" to get a first impression. Don't even assume that operating systems keep using the same shell as default shell.

And yes, there is more than just the huge Unix family. MS-DOS copied concepts from CP/M and also a very little bit of Unix. OS/2 and the Windows NT family (including 2000, XP, Vista, 7, 10) copied from MS-DOS. Windows 1-3, 9x, ME still ran on top of DOS. From this tree of operating systems, we got command.com and cmd.exe.

By the way: Modern MacOS variants (since MacOS X) are part of the Unix family, and so is Android (after all, it's just a heavily customized Linux).

Some ugly details:

And when it comes to Windows (and DOS, OS/2), legacy becomes really ugly.

So, to sum it up, there is no thing like "the" default shell. There are a lot of default shells, all with more or less different behaviour. You can't even hope that the default shell resembles a well-known family of shells, like bourne. So there is much potential for nasty surprises.

Why and how does that affect Perl?

Perl has several ways to execute external commands, some more obvious, some less. In the very basic form, you pass a string to perl that roughly ressembles what you would type into your favorite shell:

  • system('echo hello');
  • exec('echo hello');
  • open my $pipe,'echo hello |' or die "Can't open pipe: $!"; my $hello=do { local $/; <$pipe> }; close $pipe;
  • my $hello=qx(echo hello);
  • my $hello=`echo hello`;

Looks pretty innocent, doesn't it? And it is, until you want to start doing real-world things, like passing arguments containing quotes, dollar signs, or backslashes to an external program. You need to know the quoting rule of whatever shell happens to be the default shell.

For those cases, perl is expected to pass the string to /bin/sh for execution. Except that in this innocent case, and several other cases, perl does not invoke the default shell at all. Burried deep in the perl sources, there is some heuristics happening. If perl thinks that it can start the executable on its own, because the command does not contain what is documented as "shell metacharacters", perl splits the command on its own and can avoid invoking the default shell.

Why? Because perl can easily figure out what the shell would do, and do it by itself instead. This avoids a lot of overhead and so is faster and does not use as much memory as invoking the shell would.

Unfortunately, the documentation is a little bit short on details. See "Perl guessing" in Re^2: Improve pipe open? (redirect hook): From the code of Perl_do_exec3() in doio.c (perl 5.24.1), it seems that the word "exec" inside the command string triggers a different handling, and some of the logic also depends on how perl was compiled (preprocessor symbol CSH).

If you don't need support from the default shell, you can help perl by passing system(), exec(), and open() a list of arguments instead of a string. This "multi-argument" or "list form" of the commands always avoids the shell, and it completely avoids any need to quote.

(Well, at least on Unix. Windows is a completely different beast. See Re^3: Perl Rename and Re^3: Having to manually escape quote character in args to "system"?. It should be safe to pretend that you are on Unix even if you are on Windows. Perl should do the right thing with the "list form".)

So our examples now look like this:

  • system('echo','hello','here','is','a','dollar:','$');
  • exec('echo','hello','here','is','a','dollar:','$');
  • open my $pipe,'-|','echo','hello','here','is','a','dollar:','$' or die "Can't open pipe: $!"; my $hello=do { local $/; <$pipe> }; close $pipe;

Did you notice that qx() and its shorter alias `` don't support a list form? That sucks, but we can work around that by using open instead. Writing a small function that wraps open is quite easy. See "Safe pipe opens" in perlipc.

Edge cases

OK, let's assume I've convinced you to use the list forms of system, exec, and open. You want to start a program named "foo bar", and it needs an argument "baz". Yes, the program has a space in its name. This is unusual but legal in the Unix family, and quite common on Windows.

  • system('foo bar','baz');
  • exec('foo bar','baz');
  • open my $pipe,'-|','foo bar','baz' or die ...

or even:

my @command=('foo bar','baz'); and one of:

  • system @command;
  • exec @command;
  • open my $pipe,'-|',@command or die ...

All is well. Perl does what you expect, no default shell is ever involved.

Now, "foo bar" get's an update, and you no longer have to pass the "baz" argument. In fact, you must not pass the "baz" argument at all. Should be easy, right?

  • system 'foo bar';
  • exec 'foo bar';
  • open my $pipe,'-|','foo bar' or die ...

or:

my @command=('foo bar'); and one of:

  • system @command;
  • exec @command;
  • open my $pipe,'-|',@command or die ...

Wrong! system, exec, and even open in the three-argument form now see a single scalar value as the command, and start once again guessing what you want. And they will wrongly guess that you want to start "foo" with an argument of "bar".

The solution for system and exec is hidden in the documentation of exec: Pass the executable name using indirect object syntax to system or exec, and perl will treat the single-argument list as list, and not a single command string.

  • system { 'foo bar' } 'foo bar';
  • exec { 'foo bar' } 'foo bar';

or:

my @command=('foo bar'); and one of:

  • system { $command[0] } @command;
  • exec { $command[0] } @command;

If the command list is not guaranteed to contain at least two elements (e.g. because arguments come from the user or the network), you should always use the indirect object notation to avoid this trap.

Did you notice that we lost another way of invoking external commands here? There is (currently) no way in perl to use pipe open with a single-element command list without triggering the default shell heuristics. That's why I wrote Improve pipe open?. Yes, you can work around by using the code shown in "Safe pipe opens" in perlipc and using exec with indirect object notation in the child process. But that takes 10 to 20 lines of code just because perl tries to be smart instead of being secure.

Avoiding external programs

Why do you want to run external programs? Perl can easily replace most of the basic Unix utilities, by using internal functions or existing modules. And as an additional extra, you don't depend on the external programs. This makes your code more portable. For example, Windows does not have ls, grep, awk, sed, test, cat, head, or tail out of the box, and find is not find, but a poor excuse for grep. If you use perl functions and modules, that does not matter at all. Likewise, not all members of the Unix family have the GNU variant of those utilities. Again, if you use perl functions and modules, it does not matter.

ToolPerl replacement
echoprint, say
rmunlink
rm -rFile::Path
mkdirmkdir
mkdir -pFile::Path
rmdirrmdir
grepgrep (note: you need to open and read files manually)
awka2p
seds2p
ls, findFile::Find, glob, stat, lstat, opendir, readdir, closedir
test, [, [[stat, lstat, -X, File::stat
cat, head, tailopen, readline, print, say, close, seek, tell
lnlink, symlink
chmodchmod
chownchown
touchutime
curl, wget, ftpLWP::UserAgent and friends
ftpNet::FTP
sshNet::SSH2, Net::OpenSSH

Note: The table above is far from being complete.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Replies are listed 'Best First'.
Re: The problem of "the" default shell
by haukex (Abbot) on Dec 09, 2017 at 13:56 UTC

    Excellent post, thank you very much for this! Bookmarked :-)

    Did you notice that qx() and its shorter alias `` don't support a list form?

    IPC::System::Simple provides the function capturex as a replacement for qx that always avoids the shell*.

    IPC::Run3 is another good module that will avoid the shell* if you give it an arrayref. The more advanced IPC::Run, although its documentation does not mention this, will also use the exec {...} ... form, although I haven't yet fully traced back in how many cases this is used - it's not always.

    I wrote a post in a similar vein to yours showing example code with these modules here.

    * Does not apply on Windows, but I have heard good things about Win32::ShellQuote, which recent versions of IPC::Run3 use internally. On *NIX systems and others with execvp(3), the above should always completely avoid the shell.

    Update 2: Cleaned up formatting and wording.

    (Side note: I don't normally both front-page and reply to a node, but I feel this is an important topic.)

Re: The problem of "the" default shell
by Corion (Pope) on Dec 09, 2017 at 15:28 UTC

    Also, there is ExtUtils::Command, which has lots of unixish commands, as needed by ExtUtils::MakeMaker.

    Its interface is a tiny bit ugly, as its routines expect all parameters to come in via @ARGV, but there also is Shell::Command which wraps these subroutines in a nicer way.

Re: The problem of "the" default shell
by perlancar (Monk) on Dec 10, 2017 at 10:01 UTC

    Aside from the aforementioned IPC::System::Simple, I also wrote IPC::System::Options which:

    • unlike IPC::System::Simple, provides an interface that is backward-compatible with the built-in system() and readpipe() (backtick), meaning that if you use them like the built-in, they will behave the same;
    • provides option (either on a per-call basis, or on a per-import basis) to always try to avoid the shell, or to always try to use the shell;
    • provide a bunch of other options, e.g.: to die on failure, to log using Log::ger, to capture or tee output, to set environment variables, to chdir() first, and to run in dry-run mode.
Re: The problem of "the" default shell
by Laurent_R (Canon) on Dec 11, 2017 at 07:33 UTC
    Thank you very much, afoken, I learned quite a bit from your post.

    I usually tend to avoid external commands as much as I can, but when I have had to, I sort of knew that it was deemed to be better to use a list of arguments rather than a single string, but I did not really know why, and I was never clear about the best way to call an external command. Now I understand better the impact of the various solutions. Thanks.

Re: The problem of "the" default shell
by Anonymous Monk on Dec 14, 2017 at 18:35 UTC
    As a general rule, I agree with all of your advice but there are definately exceptions to the rule.
    Why do you want to run external programs?

    Well, I want to use GNU Grep because it's orders of magnitude faster than the Perl version.

    I think you sort of just ignore the whole sysadmin use of Perl because it's fine to rely on bashisms when you are root and know that your shell is bash. If you intend to release your code to the world, then yes, portability is a huge issue but not so much when it's a one off script to glue the server together.

      I think you sort of just ignore the whole sysadmin use of Perl [...] portability is a huge issue but not so much when it's a one off script to glue the server together.

      Not at all. Imagine you wrote a (shell or perl) script heavily relying on the default shell (/bin/sh) being bash a few years ago, for a Debian 5 (Lenny) or Ubuntu 5 system, or maybe for Ubuntu 6.06 LTS. Running that script on the next newer version of the same(!) distribution suddenly breaks things. Ubuntu 6.10 and debian 6.0 (Squeeze) have switched /bin/sh away from bash and run Debian's variant of the Almquist shell (dash) instead.

      No rules were broken, as explained in https://wiki.debian.org/Shell - dash is still SUSv3 / POSIX compliant, like bash. The problem was that people wrongly assumed /bin/sh == bash, and often still do. If you want bash, explicitly ask for it. Ubuntu explains that in https://wiki.ubuntu.com/DashAsBinSh. There is a tool called checkbashisms that checks for bash features used in scripts intended to be run by /bin/sh.

      [...] because it's fine to rely on bashisms when you are root and know that your shell is bash. [...]

      My and your shell and root's shell from /etc/passwd don't matter. system, exec, open, qx, ``, and even system(3) invoke the default shell /bin/sh and not the user's shell.

      You may play russian roulette with your system by assuming /bin/sh == bash, but as shown lately by Debian and Ubuntu, that assumption may break sooner or later. It wasn't the first time that /bin/sh was changed, and it won't be the last time. Have a look at various system shells and compare how the default shell of various unix systems (including the *BSDs and MacOS X) changed over time.

      Anyway, you can go the painful way of embracing the default shell from perl (by using the single-string variants of system, exec, open, and by using qx/``) instead of avoiding it. You can even get your shell code run by a real bash instead of whatever default shell may be installed. It is quite trivial: stuff everything into a string quoted properly for the default shell, and invoke bash -c $quotedstring.

      There's only one catch: You don't always know the exact quoting rules for the default shell, so bash may get arguments you did not want to pass to bash. So, what can you do? Right, use the multi-argument form of system, exec, open to invoke bash:

      system('bash','-c','whatever commands shall be executed by bash');

      Or, do it the DOS way:

      1. Write a bash script from your perl script to a temporary file, including all arguments that you wanted to pass to the bash, and including output redirection to some more temporary files. Note that you can (and must) use bash quoting rules here. Yes, you successfully avoided guessing the default shell's quoting rules.
      2. Write a temporary file containing input to the bash script, if needed.
      3. Invoke that bash script without arguments using the single-string version of system or open. You can't use exec here, because you have to clean up the temporary files.
      4. Read the output files
      5. Remove the temporary files.

      Note that safely creating and removing temporary files, especially as root, is a non-trivial problem of its own that needs its own meditation.

      All of this mess just to avoid using the multi-argument forms of system, exec, and open. I won't stop you from doing that. After all, TIMTOWTDI. But I prefer the way of less risk and less work.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1205217]
Front-paged by haukex
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (2)
As of 2018-04-22 01:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?