http://www.perlmonks.org?node_id=1001997

alain_desilets has asked for the wisdom of the Perl Monks concerning the following question:

I'm a total newb when it comes to security, and I am trying to figure out how to use taint mode to locate and patch vulnerabilities in a Perl CGI application.

I have read a bit about taint mode, and I am puzzled by two things that don't seem to make sense. So I am wondering if I am just misunderstanding something.

Firstly, it seems that when I apply a regexp match to a tainted variable, then Perl considers the matched groups to NOT be tainted. The assumption being that I executed this regexp match specifically to remove potential threats from the tainted sting.

This seems to be a very naive assumption. For example, it is very common for me to remove leading and trailing space from a text input, before proceeding ahead with it. From then on, Perl would consider that the input is untainted, eventhough the regexp match I applied to it has nothing to do with removing potential threats from it.

When I first read about taint mode, I assume that there would be some kind of function untainted(), which would label a particular variable as having been untainted. But there doesn't seem to be a way to do that.

The other thing I notice is that Perl only considers tainted variable as being dangerous when used in system calls and the like. But there are many other situations where using a tainted variable is dangerous. For example, if I write some JavaScript code to my CGI scripts STDOUT, and that code is composed based on the content of a tainted variable, then I might be opening my application to a cross site scripting attack. Yet, Perl considers that priting a tainted value to STDOUT is safe.

Again, when I first started reading about taint mode, I expected that it would identify every single instance of tainted variable and force me to look at it explicitly. But instead, it only flags the use of tainted variables in contexts where Perl thinks it might be unsafe to use, and obviously, Perl's idea of what is safe is too permissive. Is there a way to tell taint mode to identify all tainted variables, whether they are used in an unsafe fashion or not?

Thanks. Alain

Replies are listed 'Best First'.
Re: Taint mode limitations
by BrowserUk (Patriarch) on Nov 02, 2012 at 17:11 UTC
    when I first started reading about taint mode, I expected that it would identify every single instance of tainted variable and force me to look at it explicitly.

    It does! What it cannot do -- which you seem to be expecting -- is decide whether you looked closely enough.

    Think of it like front desk security issuing outsiders with a visitor's pass. If then you chose to leave them alone in the vault for a while, that is down to you.

    • When data comes in to your program from a (potentially) unsafe source it gets marked.
    • If you attempt to use that data before you've taken some action to validate it; Perl will yell at you.
    • But if Perl never allowed you to remove the mark; you would never be able to use that data.
    • And there is no way for Perl to determine whether what you have done has rendered that data "safe". Only you can make that decision.

    So perl give you the tools; if you choose to use them incorrectly, that is down to you.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      when I first started reading about taint mode, I expected that it would identify every single instance of tainted variable and force me to look at it explicitly.
      It does! What it cannot do -- which you seem to be expecting -- is decide whether you looked closely enough.

      I understand that it's my responsability to make sure I have looked at the input closely enough. My issue is that Perl tries to "guess" when I have looked at the the input ("gee, the programmer captured some match groups from a regexp match on that input, so it MUST mean that he sanitized it"), instead of letting me tell it when I think I have looked at it closely enough (for example, but invoking a method untainted() on a variable).

      Using your front desk metaphor, suppose I am a security guard patrolling the corridors of a building. As I go through the front gate in the morning, I notice my front desk colleague making eye contact with a visitor. Later on, I see this visitor wandering the corridors without a pass. Can I assume that this visitor is authorized just because my colleague made eye contact with him? No, of course not!

        My issue is that Perl tries to "guess" when I have looked at the the input ("gee, the programmer captured some match groups from a regexp match on that input, so it MUST mean that he sanitized it"), instead of letting me tell it when I think I have looked at it closely enough (for example, but invoking a method untainted() on a variable).

        Perl isn't "guessing". It is following the clearly laid out rule for 'detainting'. That is:

        Perl presumes that if you reference a substring using $1, $2, etc., that you knew what you were doing when you wrote the pattern.

        And it goes on to say:

        That means using a bit of thought--don't just blindly untaint anything, or you defeat the entire mechanism.

        That may not be how you think it should work; but it is the way it does work. For better or worse.

        You can try putting forwards your arguments for a different -- presumably better in your eyes -- way of working; but given how long the current mechanism has been in place; that the mechanism is -- has to be -- deeply embedded within the Perl core; and the historic convention that says Perl does not break backward compatibility; and the net result is that you will have to learn to live with what is; because it is very unlikely to change at this point in time.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

        ... [let] me tell it when I think I have looked at it closely enough (for example, [by] invoking a method untainted() on a variable) ...

        But how would you "look at it" in the first place? Almost always by a regex match of some kind. So one would wind up with a statement like
            untaint($hinky) if my @safe = $hinky =~ m{ \A now (get) some (stuff) here \z }xms;
            then_do_safe_stuff_with($hinky, @safe);  # $hinky now safe, too

        But what is to be gained by making explicitly required an action that is already implicit in the successful regex match? Everything still depends on crafting an effective validation regex.

Re: Taint mode limitations
by davido (Cardinal) on Nov 02, 2012 at 16:26 UTC

    Taint::Util is a nice little module that provides the ability to test taintedness of a variable at runtime in a simple and sane way. It also allows you to explicitly taint or untaint. It could, therefore, be used to re-taint a variable that has had whitespace stripped with a regex. But you should remember that only captures are untainted. Simple substitutions aren't.

    The core module Scalar::Util also has a tainted function that can be used to test taintedness, but it doesn't provide functions to explicitly manipulate the state.

    My suggestion, however, is to pass your inputs through "the Prussian stance" style of sanitization before you even deal with cosmetic cleanup (stripping whitespace). If the very first thing you do to your data is to retrieve the safe portions you want to work with, then the infiltration of tainted data through the rest of the program is minimized. It's far easier to sanitize close to the source of input than later on after the input may have been transformed and passed around to various other components of the application.

    Test::Taint is also worth mentioning as it helps your test suite to test assumptions about taintedness.


    Dave

      My suggestion, however, is to pass your inputs through "the Prussian stance" style of sanitization before you even deal with cosmetic cleanup (stripping whitespace). If the very first thing you do to your data is to retrieve the safe portions you want to work with, then the infiltration of tainted data through the rest of the program is minimized. It's far easier to sanitize close to the source of input than later on after the input may have been transformed and passed around to various other components of the application.

      Excellent suggestion, and my plan is to do exactly that.

      My point is that I was hoping that 'taint' would help me locate all the places where I get user input, and force me to deal with them explictely. But it seems taint mode is not that useful for that, because it makes too many false assumptions about (a) what is considered a "cleaning action" (i.e. capturing groups after a regexp match on a tainted variable), and (b) what uses of a tainted variable are "unsafe" (ex: it doesn't consider printing to the CGI script's STDOUT to be unsafe, eventhough that might result in printing malicious JS to the script's STDOUT).

      Maybe I'm completely arrogant here, but it seems that taint mode is in fact counterproductive, because it lulls you into thinking that it spots all the places where you forgot to explicitly address (as in "at least think about") malicious inputs. But it doesn't, again because of its poor assumptions (a) and (b).

      This seems like a design flaw in taint mode, which could easily be improved if the assumptions were changed to:

      • a) A tainted variable is considered to be untainted when the programmer explicitly invokes a function untaint() on it.
      • b) Any use of a tainted variable in a known dangerous situation raises an error (and that should include writing it to STDOUT).
      • c) You have to call untaint() on all tainted variables before the end of the process, otherwise an error will be raised (this forces you to take notice when you failed to explicitly address the tainting of that variable).

      The main difference with taint mode's current assumptions are that in condition (a), the programmer has to do a much more explicit action to signal that a variable is untainted. In (b), we expand the list of unallowed operations to printing to STDOUT, which should catch a lot of cross scripting vulnerabilities. But I think it's dangerous to assume that we can catch all potential dangerous uses of a tainted variable, which is why I think we need to add a (c), to flag all tainted variables, where they are used in a way that is known to be dangerous or not.

      Note that even with these modified assumptions, there is still a risk that you will use a tainted variable in a way that is dangerous but does not correspond to a known dangerous use, and that you use that variable before you have explicitly cleaned it and untainted() it.

      However, I suspect that (c) would strongly encourage people to clean and untaint() their user intputs as soon as they acquire them, to prevent the tainting from spreading to other variables on which you will also have to invoke untaint().

      Another idea I have is that maybe the "proper" way to deal with security in a Perl CGI context, would be to have a class SafeCGI, which would allow the programmer to "declare" what CGI arguments are admissible, and what types of cleanup to carry out on them (ex: HTML entity escaping, shell character escaping, etc...). If you declare an input without specifying cleanup options, then the default options are to apply everything but the kitchen sink.

      Then, all you have to do is to make sure that you always go through SafeCGI instead of CGI, to acquire user inputs. In my case, that's easy enough, because all my application's dialog go through a single dialog factory, so there is only one place where CGI arguments are acquired.

      The nice things about this approach are:

      • If you add a new CGI argument in a form and forget to declare it, you will know about it.
      • If you forget to specify what cleanup to do on a CGI argument, you will end up cleaning it to the max
      • Malicious users cannot feed non-defined arguments to your application

      The only way you can mess up is if you acquire your CGI arguments through the CGI class instead of CGISafe. But if like me you centralize acquisition of CGI arguments in a central place, that's a non-issue.

      I looked around for something like this on CPAN, but couldn't find one. Maybe I'll implement it myself...

      I am curious to hear what people think about those ideas. Am I completely off the wall here?

        Maybe I'm completely arrogant here, but it seems that taint mode is in fact counterproductive, because it lulls you into thinking that it spots all the places where you forgot to explicitly address (as in "at least think about") malicious inputs. But it doesn't, again because of its poor assumptions (a) and (b).

        Within their respective problem domains, the same could be said for strict, warnings, Perl::Critic, Safe, prove, and so on. The tool is not the problem unless it's inaccurately or inadequately documented. The problem is the misunderstanding, which is hard to eliminate, because people tend not to read the documentation past the point of "getting it to work."

        I think the challenge with respect to user input is one of education. With a constant deluge of new programmers in the field, the near total lack of discussion of security issues in many university CS programs, the low barrier to entry into scripting languages, and the high stakes involved that make malicious attacks profitable for the hacker, we've been fighting it for decades and will probably continue to do so. It's hard enough to get people to use the tools that are available. It's even harder to get them to take the time to understand them. There are plenty of posts on the net where the mantra is reiterated; don't cargo-cult code without understanding it. This is especially true in any type of programming where security if an issue.

        Tools can never guarantee security. They can simply encourage good behavior and good practices. You're correct though; the tools can lull one into a false sense of security. But making them more effective without taking options away from the programmer is quite difficult. There's a fine line between encouraging good behavior, and hampering creativity. I remember an old saying that I'll paraphrase: "If you make it so easy even a fool could use it, only fools will use it."

        This is a healthy discussion, and I'm glad you brought it up. I'm glad that you make the observation that taint mode isn't completely effective. The more attention the subject gets, hopefully the more people will become aware that the programmer has to assume some responsibility for educating herself on safe practices.


        Dave

        However, I suspect that (c) would strongly encourage people to clean and untaint() their user intputs as soon as they acquire them...

        Regardless of the existence or presence of taint mode, secure applications do this already.

        I understand your argument (reusing capture groups for untainting was a mistake of the premature reuse of a feature), but I don't see the current situation as an onerous burden. Even without taint mode I would still write my code to perform input validation at the edges of the program, just as I handle encoding concerns at IO boundaries.

Re: Taint mode limitations
by MidLifeXis (Monsignor) on Nov 02, 2012 at 16:04 UTC

    Note the difference between the following (assuming windows, adjust system as necessary).

    perl -T -e "$ENV{PATH}='c:\\windows'; $string='original'; $string=$1 i +f $ARGV[0] =~ /([a-zA-Z]+)/; system qq(notepad.exe $string) and die $ +!" perl -T -e "$ENV{PATH}='c:\\windows'; $string='original'; $string=$1 i +f $ARGV[0] =~ /([a-zA-Z]+)/; system qq(notepad.exe $string) and die $ +!" foo perl -T -e "$ENV{PATH}='c:\\windows'; $string=$ARGV[0]; $string =~ s +/ //g; system qq(notepad.exe $string) and die $ +!" perl -T -e "$ENV{PATH}='c:\\windows'; $string=$ARGV[0]; $string =~ s +/ //g; system qq(notepad.exe $string) and die $ +!" foo

    Only the last command generates the error message: Insecure dependency in system while running with -T switch at -e line 1.. The first two commands illustrate how to untaint a parameter, the last two commands are doing what I think you describe.

    --MidLifeXis

Re: Taint mode limitations
by Anonymous Monk on Nov 04, 2012 at 15:07 UTC

    ... very naive assumption ...

    To call a feature/tool which you don't understand "very naive assumption" , I think is , very naive, ridiculously so :)

    Anyway, re#'taint' mode

    $ perl -MTaint::Util -Mre=taint -Tle " $_ = $ENV{PATH}; sub f{printf q +q/\$_ tainted? %d\n/, tainted $_;} f; $_ = $1 if / (.*)/; f; untaint $_; f;" $_ tainted? 1 $_ tainted? 1 $_ tainted? 0
Re: Taint mode limitations
by Anonymous Monk on Apr 18, 2013 at 04:11 UTC