http://www.perlmonks.org?node_id=734802

jaldhar has asked for the wisdom of the Perl Monks concerning the following question:

Heres another problem in the module I wrote when built under Windows. My tests run in taint mode. They share some common code; the relevant bits look like this...

use English qw/-no_match_vars/; ... our $filespec; if ($OSNAME =~ /win/i) { $filespec = qr{ (\A(?:[[:alpha:]]:)?[ \\ \. \- [:space:] [:word:] ]+)\z }m +sx; } else { $filespec = qr{ (\A[- + @ [:word:] . / ]+)\z }msx; } ... our $perl = untaint_path( $EXECUTABLE_NAME, '$perl' ); ... sub untaint_path { my ( $path, $description ) = @_; if ( !( $path =~ $filespec ) ) { die "$description is tainted.\n"; } return $1; }
A tester using strawberry perl 5.10.0 on Windows XP is reporting this...
t\01-load.................1/1 # Testing Module::Starter::Plugin::CGIAp +p 0.07, Perl 5.010000, C:\STRAWB~1\perl\bin\perl.exe t\01-load.................ok t\extutils-makemaker......$perl is tainted. Compilation failed in require at t\extutils-makemaker.t line 12. BEGIN failed--compilation aborted at t\extutils-makemaker.t line 12. t\extutils-makemaker...... Dubious, test returned 2 (wstat 512, 0x200) No subtests run
the reason it is failing I believe is the ~ in STRAWB~1 which I didn't realize was legal in a Windows file name.

So all this background leads me to my question. What is the canonical regexp for untainting windows file names? Based on the code shown above, are there any other situations where untaint_path could fail?

--
જલધર

Replies are listed 'Best First'.
Re: One true regexp for untainting windows filenames?
by ikegami (Patriarch) on Jan 08, 2009 at 05:10 UTC

    Checking if $^X is a valid file name doesn't make it safe. You might as well use /(.*)/s.

    On the plus side, there doesn't appear to be any reason for $^X to be tainted in Windows.

    use Win32::Process; sub ErrorReport{ print Win32::FormatMessage( Win32::GetLastError() ); } Win32::Process::Create( my $child, 'c:\\progs\\perl5100\\bin\\perl.exe', 'evil -le"print $^X"', 0, NORMAL_PRIORITY_CLASS, "." ) or die ErrorReport(); $child->Wait(INFINITE);
    c:\progs\perl5100\bin\perl.exe

    If you trust the perl you are running, then it looks like $^X is safe.
    If you don't trust the perl you are running, then it doesn't matter if $^X safe or not.

    By the way,
    everything that matches qr{^[^/\0]+\z} is a valid file name in unix,
    and everything that matches qr{^[^\0]+\z} is a valid file path in unix.
    I don't know where you got qr{ (\A[- + @ [:word:] . / ]+)\z }x from.

      It seems that despite the length of my post I still managed to leave out some pertinent information. Sorry! I use untaint_path() to check several filenames not just $^X. It just happens that this is the first test that encountered a weird path. So the question still stands even if $^X is safe.

      On that topic, I am using the value of $^X in a qx// call. On Linux at least, if I don't untaint it, I get a nastygram about "insecure dependency." Should perl be a little smarter here?

      As for the regexps themselves, I am embarrassed to say I just copied them from existing code. Now I will use the ones from File::Basename as cdarke suggested.

      Thank you for your help.

      --
      જલધર

        I use untaint_path() to check several filenames not just $^X.

        To make them safe for what? Most most applications, untaint_path might remove the taint flag, but it doesn't make sure they're safe first.

        On that topic, I am using the value of $^X in a qx// call. On Linux at least, if I don't untaint it, I get a nastygram about "insecure dependency." Should perl be a little smarter here?

        In unix systems, it's possible to execute a binary at one path while making it think it's at a different path.

        $ cat > a.c #include <stdio.h> int main(int argc, char** argv) { printf("%s\n", argv[0]); return 0; } $ gcc -o a a.c $ perl -e'exec { "a" } "evil"' evil

        Based on a comment in $^X, it looks like there's a way for processes to find out which binary is actually being executed on some systems, and Perl uses it.

        If the following doesn't print "evil" on your system, $^X can probably be trusted on your system.

        $ perl -e'system { "perl" } "evil", "-le", "print \$^X"' /usr/bin/perl
Re: One true regexp for untainting windows filenames?
by cdarke (Prior) on Jan 08, 2009 at 08:47 UTC
    Strictly speaking it is the filesystem which defines which characters are legal, not the operating system. This means that a drive shared between *nix and Windows (using, say, Samba) can have an interesting effect on file naming. Also remember that NTFS filenames can contain Unicode characters.

    Take a look at the .pm for File::Basename (which should be in your base release).

      Thanks for the tip. I found slightly more understandable code in File::Spec which has resulted in the following regexps: for Unix...

      qr{(\A (?: .* / (?: \.\.?\z )? )? [^/]* )}msx;
      ...and Windows (includes UNC paths)...
      qr{(\A (?: [a-zA-Z]: | (?:\\\\\\\\|//)[^\\\\/]+[\\\\/][^\\\\/]+ )? (?:.*[\\/](?:\.\.?\Z(?!\n))?)? .* )}msx;

      --
      જલધર

        There is no a string that

        qr{(\A (?: .* / (?: \.\.?\z )? )? [^/]* )}msx

        won't match.

        It's wrong for two reasons.

        • "foo" gets "untainted" as "".
        • "x/xx\0xx"" is believed to be a valid file name, but it isn't.

        Valid unix paths and only valid unix paths match

        qr{^([\0]+)\z}

        (Although that doesn't mean there can ever be a file referenced by that path.)