Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: curl without backticks and system() (updated x2)

by haukex (Abbot)
on Jan 29, 2017 at 22:59 UTC ( #1180578=note: print w/replies, xml ) Need Help??


in reply to curl without backticks and system()

Hi urbansumo,

Is using backticks and/or system() really that bad, ie is it ever appropriate to use backticks and what security issues does it pose if you do, or is it just considered sloppy (ie not purist) programming?

Questions about running external commands are asked often, and I've answered a few of them, but I'll take this opportunity to write on this topic more comprehensively.

Update 2017-01-31: I've added the section on "Streaming Output" at the bottom, and the TL;DR link. Update 2017-02-12: Added discussion of piped opens in appropriate places. The final IPC::Run example previously did not check for successful execution, it does now. A few other minor fixups in the text.

TL;DR: There are working code examples beginning in the section "The (Better) Alternatives". But you really should take the time to read about the potential security issues!

The Problems with system/exec, Backticks (`...`/qx), and Piped open

system/exec with only one string argument, backticks (`...` / qx/.../), and piped opens with one string argument (e.g. open my $fh, '-|', $cmd; and all its variants) all suffer from a security problem if you interpolate user input or anything derived from user input into them, because the command string is executed with sh -c STRING or its equivalent. Consider this somewhat contrived/silly example, but stuff like this does happen in the wild:

# WARNING: THIS CODE HAS A SECURITY HOLE print "Which file to show? "; chomp( my $f = <STDIN> ); die "bad filename" if $f=~/\.\./; # don't allow updirs system("cat /sharedfiles/$f")==0 or die "command failed, \$?=$?";

Now one might think that disallowing .. prevents someone from entering "../etc/passwd". But what if the user enters "; cat /etc/passwd", "; rm somefile", or worse? The shell will happily execute those commands - that's the security hole, and it's possible because the string is run by the shell, so the full syntax and power of the shell is available. Aside from security holes like this, it can also in some cases be very tricky to get quoting and escaping of whitespace right.

Another disadvantage of both system and backticks is that system captures neither the command's STDOUT nor STDERR, and backticks don't capture the command's STDERR. So if you run a command like e.g. find in backticks, it may issue warnings to STDERR that your program never finds out about, and that will instead end up somewhere else - the user's terminal, possibly confusing them, a log file somewhere where you'll never see it if you don't check, or perhaps in the emails that cron sends, etc. Also, people tend to forget to check $? for errors when using system and backticks (Note: see the system docs for an example of how to interpret $?). So those are things I'd consider "sloppy".

Another way that running external commands could be considered "sloppy" is that many of the things that one sees people do with external commands could just as well be done in pure Perl, or with an appropriate module, and it's of course less efficient to call an external command, plus the programmer is left with parsing the command's output manually. Note this also applies in your case: even if it were necessary to call curl, the grep could just as easily be done in Perl itself.

Avoiding the Shell

There are several ways to avoid going through the shell, meaning the program is invoked directly, and the arguments that you give to e.g. system are passed through to the program's argv directly, without any interpolation. On *NIX systems, it's possible to completely avoid the shell like this, as I'll describe below, and on Windows AFAIK it's not quite as easy to prevent shell interpolation, but there are still ways to avoid problems, which I'll also touch on below.

Backticks always suffer from the shell interpolation problem, but system and the related function exec allow you to avoid the shell, if you call system(LIST) where LIST must have two or more elements, or if you call system PROGRAM LIST. A similar thing is possible with piped opens, see the section on them near the end of this article. It's also worth mentioning that code like system(@array) can be dangerous unless you've made sure that @array has more than one element (e.g. die unless @array>1). So, if we rewrite the code from above in one of the following two ways:

system("cat", "/sharedfiles/$f")==0 or die "command failed, \$?=$?"; # OR system({"cat"} "cat", "/sharedfiles/$f")==0 or die "command failed, \$?=$?";

Then, if the user attempts to enter "; cat /etc/passwd" or "; rm somefile", the cat program will be called with those strings as its first command line argument without any interpolation by the shell, and it will attempt to open files named literally "/sharedfiles/; cat /etc/passwd" or "/sharedfiles/; rm somefile", which foils the attack.

Update 2017-12-09: Fellow monk afoken wrote a very nice post that complements this one here: The problem of "the" default shell.

When are system and Backticks OK?

As for when it's appropriate to use backticks, I'd say anytime when you know that you have to use an external program because there isn't a Perl module available, and you know the command won't output anything on STDERR, and most importantly, you don't interpolate any user input into the backticks! Even though it's possible to do some stringent checks on the user input, like $input=~tr/A-Za-z0-9_//cd; or die unless $input=~/\A[A-Za-z0-9_]+\z/;, it's easy to make a mistake in such a regex and/or miss a shell metacharacter, so I would still recommend one of the modules below that avoid the shell.

I'd say system is appropriate when you use system LIST with two or more items in LIST (!) or system PROGRAM LIST, and you either know for sure that the command won't output anything, or you don't mind its output being passed through - for example, your Perl script is outputting stuff to STDOUT and you specifically want the external command's STDOUT and STDERR to become part of your output.

Sometimes, there are scripts that aren't meant for production, the programmer just wants to write a one-off script off the top of their head as quickly as they can, there is no user input involved, etc. In these cases it may be acceptable to do something like, for example, my @files=`find ...`; instead of mucking around with File::Find or installing a module that makes listing files easier (e.g. File::Find::Rule). But the point here is that one should make an informed choice, instead of taking the easiest path and remaining ignorant to the issues :-)

The (Better) Alternatives

There are lots of modules that allow you to run external commands (see for example the "See Also" section of Capture::Tiny), and a few years ago, in the course of writing my module IPC::Run3::Shell, I did some research on all of them and came up with a few favorites. Note that the shell interpolation issue I described above can still show up even when using a module, so make sure to read the module's documentation on how to avoid it.

  • IPC::Run3: My personal favorite, it runs on *NIX and Windows, has excellent test scores, very few dependencies, it allows redirection of STDIN, STDOUT, and STDERR, and it allows you to avoid the shell (one must give it an arrayref as the command!) - on Linux, it can avoid the shell completely, and on Windows, it automatically uses Win32::ShellQuote (as of version 0.047). Two things it can't do is timeouts and interactive communication with the subprocess, but I've rarely needed those. Also, it doesn't do much error handling and leaves the inspection of $? up to you. One more thing that some people see as a disadvantage is that it generally uses temporary files for its redirections, but I personally don't usually mind this since I usually try to limit how often I call external commands and how much data I exchange with them.

    use IPC::Run3 0.047 'run3'; my $cmd = ['cat','-nE']; my $stdin = "Hello,\nWorld!\n"; run3 $cmd, \$stdin, \my $stdout, \my $stderr or die "run3 failed"; die "run3 failed, \$?=$?" unless $?==0; print "# STDOUT:\n$stdout"; print "# STDERR:\n$stderr";
  • Capture::Tiny: This module captures everything written to STDOUT and/or STDERR, including stuff from the Perl process itself. It itself does not run external commands for you, so if that's the main goal, then one of the other modules listed here is probably better. I've still used this module often to capture output from Perl. You can also put a system call into its code block, and then it will capture the output of the external command (see its doc for details).

    use Capture::Tiny 'capture'; my ($stdout, $stderr, $exit) = capture { print "I am Perl!\n"; system "echo", "Hello,", "World!"; }; die "system failed, \$?=$exit" unless $exit==0; print "# STDOUT:\n$stdout"; print "# STDERR:\n$stderr";
  • IPC::System::Simple: Provides replacements for system and backticks with excellent error reporting (it's what autodie uses when you say use autodie qw/system/;), and it offers functions that always avoid the shell, systemx and capturex. Its disadvantages are what I discussed earlier: its system replacement doesn't capture STDOUT or STDERR, and its backtick replacement doesn't capture STDERR.

    use IPC::System::Simple qw/systemx capturex/; print "# systemx:\n"; systemx 'echo', 'Hello,', 'World!'; my $stdout = capturex 'echo', 'Hello,', 'World!'; print "# capturex: $stdout";
  • IPC::Run: Even though I've used this module rarely, it's the only one of these modules that has advanced features like communicating with subprocesses interactively (similar to Expect), or timeouts. Unfortunately, it has somewhat spotty test results on Windows, but I'd still recommend it as an alternative to IPC::Run3 if the advanced features it supports are needed.

Addendum: Streaming Output

Sometimes, reading the output of an external command in one big chunk, like most of the solutions above do, is not acceptable because either it's being streamed by the external program with pauses in between lines/records, or it's outputting a large amount of line/record-based data and you want to operate on the data one line/record at a time without slurping everything into memory. In such cases, there are several solutions that also avoid some of the problems I described above.

1. Stream Output to Perl

Instead of having your Perl script open an external command and read from it, have your Perl script read from its standard input, and stream the output to the Perl process using the shell. This has the advantage that you can use the same Perl script to also read from one or more files by specifying them on the command line.

Disadvantages are that since in this case we're using the shell to run the external command, if you want to do something with that command's exit code or STDERR stream, you will have to use the shell instead of Perl to accomplish this.

$ cat lengthsum.pl #!/usr/bin/env perl use warnings; use strict; use 5.022; # for double-diamond operator my $sum = 0; while (<<>>) { chomp; $sum+=length; } print "$sum\n"; $ cat /usr/share/dict/words | ./lengthsum.pl 839677

The code above uses the double-diamond operator <<>>, which was added in Perl v5.22. This operator uses the three-argument form of open, so that command line arguments such as "|foo" won't be treated as a piped open. If you don't have Perl v5.22 yet, then in the above script it's possible to change <<>> to <> and remove the use 5.022;, however, you should be aware of the aforementioned potential problem.

2. Use a Piped open

As of Perl v5.8, it is possible to use piped opens that avoid the shell, by giving open a list of arguments, and similar to system LIST, that list must have more than one item. One limitation of this approach is that the external command's STDERR is not captured. (See also Safe Pipe Opens.) Also, note how in the following we check the return value of close, which is necessary to see if the command exited successfully.

use 5.008; my @cmd = ('cat', '/usr/share/dict/words'); die '@cmd must have more than one element' unless @cmd>1; open my $fh, '-|', @cmd or die $!; my $sum = 0; while (<$fh>) { chomp; $sum+=length; } close $fh or die $! ? $! : $?; print "$sum\n";

3. Use IPC::Run

IPC::Run is an advanced module that can, among other things, read the external command's output in chunks and communicate with subprocesses interactively. Because it's such an extensive module, it's hard to provide a summary and you should refer to its documentation.

Unfortunately, the IPC::Run documentation does not explicitly state that giving an array reference as the command will always avoid the shell. As far as I have been able to tell from the source so far, it does seem that it does attempt to avoid the shell (e.g. it uses exec { $cmd[0] } @cmd;), so I will give a quick example here.

The following is one way to read records (in this case, lines) from an external command. Note that the external process's STDERR is not captured in this example, although it is possible to do so with this module.

use IPC::Run qw/ run new_chunker /; my @cmd = ('cat', '/usr/share/dict/words'); my $sum = 0; run \@cmd, '>', new_chunker("\n"), sub { my $line = shift; chomp($line); $sum+=length $line; } or die $?; print "$sum\n";

Hope this helps,
-- Hauke D

Replies are listed 'Best First'.
Re^2: curl without backticks and system()
by Anonymous Monk on Jan 31, 2017 at 10:17 UTC
    This is a fantastic answer! Thank you so much!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1180578]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2018-07-21 11:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (448 votes). Check out past polls.

    Notices?