http://www.perlmonks.org?node_id=212482

superpete has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks. Here is my situation. Currnet Code:
system("$program1 $file1 | $program2 -x -y | $program3 > $file2 2>/dev +/null");

I need to handle the possibility that the variables contain arbitrary weird characters (ANYTHING except / and nul).

The problem is that the shell is invoked, which messes things up. A previous posting here suggested that I use quotemeta (\Q, \E) , which appeared to work at first. Since then, however, I have done more rigorous testing, and for really weird filenames it fails.

Is there a standard way to deal with this, while keeping the convenience of the above line of code?

Is the right thing to cook up a function which takes a list of command lines, makes a bunch of pipes, and then forks off and exec's each "piece" of the pipeline (thereby avoiding the shell) ?? This seems like it would work but would be difficult to do in a "bulletproof" manner. Is there a standard module for this?

What should I do?

-Pete W

Replies are listed 'Best First'.
Re: system, pipes, shell, quoting
by IlyaM (Parson) on Nov 13, 2002 at 08:06 UTC

    Simpliest solution: break dependacy on shell. Just don't use it: replace system with IPC::Run and you don't need to call shell to pipe output from one program to another anymore.

    use IPC::Run qw(run); run([$program1, $file1], '|', [$program2, '-x', '-y'], '|', [$program3], ">$file2", '2>/dev/null') or die "Error: $?";

    --
    Ilya Martynov, ilya@iponweb.net
    CTO IPonWEB (UK) Ltd
    Quality Perl Programming and Unix Support UK managed @ offshore prices - http://www.iponweb.net
    Personal website - http://martynov.org

      ooooh :-)

      This is exactly was I was looking for, thanks very much!

      Does IPC::Run "come with perl" ?

Re: system, pipes, shell, quoting
by dws (Chancellor) on Nov 13, 2002 at 03:42 UTC
    I need to handle the possibility that the variables contain arbitrary weird characters ... Is there a standard way to deal with this, while keeping the convenience of the above line of code?

    Yes. Untaint your data. Filter the variables through whatever regexes guarantee that the variables contain only "safe" strings. Don't invoke system() if any string doesn't pass. That's the easiest, safest way to be bulletproof, and it's the standard way, too.

      you seem to have lots of experience :-)

      Do you have any specific implementation examples of how to quote arbitrary characters to the shell in a system or open ?

        Do you have any specific implementation examples of how to quote arbitrary characters to the shell in a system or open ?

        I avoid situations where arbitrary characters have to be quoted. In the rare case where I have a CGI form that accepts something like a filename, I'll reject the name if it contains any funny characters. Trying to escape funny characters is a losing battle.

        I also keep my filenames simple (using only alphanumerics, underscores, dashes, and periods).

      Maybe I am missing something, but it seems like you just want to be able to pass arguments to your programs without having the shell interpret any special characters. Most good shells don't interpret anything in single quotes, so what you might want to do something like this:
      ... my $argument = something; my $argument2 = somthing else; $argument = "'".$argument."'"; #surround in quotes $argument2 = "'".$argument2."'"; #surround in quotes $exec_string = "$program $argument $argument2"; system("$exec_string"); ...
      This way, all of your arguments are passed to the shell within single quotes, nothing is escaped or interpreted in any special way, and each arguement will be a single word even if there are spaces in the argument. Of course, you have to be using a shell that works this way or similiar, and if there is a possibility that the $argument will contain single quotes, you have to account for that with something like this:
      ... $argument=~s/'/'"'"'/g; ...
      before further processing. This all works for the bash shell and should me easily modifiable for most others.
Re: system, pipes, shell, quoting
by graff (Chancellor) on Nov 13, 2002 at 04:26 UTC
    To expound a bit on dws's remarks: you would not want to run that system call if the value of $file1 was something like "/dev/null;rm -rf *"... The point would be to make sure that the variables being used are not so "weird" as to wreak havoc when they are passed "successfully" to the shell.

    To be safe, the values of $program1, $program2, etc. would be "known", drawn from some limited set of alternatives, and the values of $file1, $file2 would be pre-conditioned to eliminate any characters that would cause problems in the shell (esp. semicolons, pipe symbols, ampersands and such) -- either remove them outright or replace them with safe punctuation characters, underscores, or whatever.

    But if you need to handle some oddball file names, you might try the following alternative to the system call:

    $file1 = s/(\W)/\\$1/; # and similarly for other filenames open( SH, "| /bin/sh"); print SH "$program1 $file1 | $program2 | $program3 > $file2\n"; close SH;
    I'm not saying this is guaranteed to work for you, but it's something to try (especially if you were intending to use the system call inside a loop: open the shell before going into the loop, then just print a command line to the shell on each iteration -- see a sample of this in a utility I posted a while ago).
      $file1 = s/(\W)/\\$1/; # and similarly for other filenames

      unfortunately, this doesn't seem to work. Here is an example :-(

      foreach $d ( @dirs ) { -e $d or die; # works -d $d or die; # works $d =~ s/(\W)/\\$1/g; system("touch $d"); # touch can't find it, it prints an error!!! }

      "open" will most likely do the same thing. The problem is the basic act of quoting arbitrary unprintable characters to the shell (nevermind security for the time being). FYI, the shell is /bin/sh on FreeBSD 4.1, which I assume is pretty solid.

        "open" will most likely do the same thing

        (ahem) You mean you won't even try it? Pull down the "shloop" utility I referred to earlier, and try this (I just did):

        echo '?*;&'
        (just to convince you that using single-quotes around a nasty string will help for some cases -- so long as the string doesn't include any single-quote characters) And then try this:
        mkdir /tmp/junk echo '?*;&' | shloop -e "touch '/tmp/junk/\i'" ls -l /tmp/junk
        If that doesn't work for you, please let me know. I don't have access to a FreeBSD system, and I'd be interested to learn how different it could be from linux & solaris (where this works).

        BTW, if your list of potential file names does happen to involve cases that include one or more quote-like characters, then you're right -- the above will not work. In such cases, let me suggest that you should only be facing this issue with existing file names that are created by other processes and are to be used as input to a given pipeline -- don't ever create such file names yourself for output. To handle these, open the the nasty-named file for input in perl (get the file name via "readdir" so you don't need to present such a name as a command-line arg -- who knows, maybe a file name could include a "\n"!), then open your pipeline as a file handle, and pass file data to the pipeline that way. E.g.:

        opendir( DIR, "some_dir" ); @files = grep /[^.]/, readdir( DIR ); # skips "." and ".." # (assumes you never need to look for files named "...") foreach $file ( @files ) { next unless ( -f $file ); $outfile = "something_sane"; open( PIPE, "| $prog1 | $prog2 -x -y | $prog3 > $outfile" ); open( IN, "<", $file ); while (<IN>) { print PIPE; } close PIPE; close IN; # and/or unlink or rename that input file so it's less of a bother hen +ceforth }
        I haven't tried that yet (but I think you should try it yourself before you say it won't work... ;^).

      I just thought of this, so correct me if I'm doing something naive.

      If you have to get the values of $program1 and $program2 from an outside source, you can put allowed values in a hash like this:

      my %allowed; $allowed{'ls'} = '/bin/ls'; $allowed{'grep'} = '/bin/grep'; $allowed{'gzip'} = '/bin/gzip'; . . .

      When it comes time to execute the program:

      my $good_prog = $allowed{$program1}; system($good_prog) if($good_prog != undef);

      This won't help you with arguments, of course. Also, you'll be limiting the actual programs that can be run (which is probably a good thing). If you want to allow everything in /bin and /usr/bin, try this (untested):

      use File::Find; my %allowed; find(\&add, '/bin', '/usr/bin'); sub add { $allowed{$_} = $File::Find::name; }
Re: system, pipes, shell, quoting
by BrowserUk (Patriarch) on Nov 13, 2002 at 08:39 UTC

    Could you give an example of a filename or better a complete command string, where quotemeta fails?

    I'm not sure I have anything to contribute, but it would be interesting to see where it is failing for you.


    Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
    Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
    Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
    Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

      #!/usr/bin/perl $fname = "foo\nbar"; #-- standard idiot-proof open $fname = "./$fname" if $fname =~ /^\s/; open F, "> $fname\0"; print F "testing 1 2 3\n"; close F; system("cat \Q$fname\E"); # produces error unlink $fname;

        And do you have any way of accessing a file who's name contains an embedded newline from your shell?

        Ie. Can you type the cat command at the CLI and have it work?


        Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
        Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
        Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
        Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

Re: system, pipes, shell, quoting
by rinceWind (Monsignor) on Nov 13, 2002 at 10:21 UTC
    Hi superpete,

    Your question brings to mind Ovid's splendid CGI course, where he deals with the issue of escaping shell characters in detail.

      ++ for rinceWind, because you should *SERIOUSLY* read what Ovid says there. If I gave examples of what you want codewise, I'd basically be regurgitating Ovid's CGI course, except he also goes into great detail to explain the how's and why's and more importantly, the *why nots*.
        ok I'll bite. I quickly read Ovid's tutorial. Anyhow, I made a solution that combines some of the suggestions in this discussion, and SAFELY uses the shell for truly arbitrary filenames.

        Let me know if you can break it, I actually think I got it right (famous last words)

        # each filename is wrapped in quotes # so we only need to escape characters which have # special meaning to the shell - when it interpolates # in double-quotes. there are only 4 such characters, # namely " ` $ and \ # interestingly, newline is NOT one of these... sub quote_for_shell { my ($x) = @_; $x =~ s/([\"\`\$\\])/\\$1/g; return "\"" . $x . "\""; } @command_line = ( quote_for_shell( $prog1 ), quote_for_shell( $file1 ), "|", quote_for_shell( $prog2 ), "-x -y", "|", quote_for_shell( $prog3 ), ">", quote_for_shell( $file3 ) ) system join " ", @command_line;

        I tested this successfully on a randomly generated directory tree full of randomly-generated filenames made up of chr(rand(128)) except "/" and "\0". Actually, $prog1, $file1, etc can have "/" in them because they might be pathnames.