Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^2: Taint mode limitations

by alain_desilets (Beadle)
on Nov 03, 2012 at 15:19 UTC ( #1002107=note: print w/ replies, xml ) Need Help??


in reply to Re: Taint mode limitations
in thread Taint mode limitations

My suggestion, however, is to pass your inputs through "the Prussian stance" style of sanitization before you even deal with cosmetic cleanup (stripping whitespace). If the very first thing you do to your data is to retrieve the safe portions you want to work with, then the infiltration of tainted data through the rest of the program is minimized. It's far easier to sanitize close to the source of input than later on after the input may have been transformed and passed around to various other components of the application.

Excellent suggestion, and my plan is to do exactly that.

My point is that I was hoping that 'taint' would help me locate all the places where I get user input, and force me to deal with them explictely. But it seems taint mode is not that useful for that, because it makes too many false assumptions about (a) what is considered a "cleaning action" (i.e. capturing groups after a regexp match on a tainted variable), and (b) what uses of a tainted variable are "unsafe" (ex: it doesn't consider printing to the CGI script's STDOUT to be unsafe, eventhough that might result in printing malicious JS to the script's STDOUT).

Maybe I'm completely arrogant here, but it seems that taint mode is in fact counterproductive, because it lulls you into thinking that it spots all the places where you forgot to explicitly address (as in "at least think about") malicious inputs. But it doesn't, again because of its poor assumptions (a) and (b).

This seems like a design flaw in taint mode, which could easily be improved if the assumptions were changed to:

  • a) A tainted variable is considered to be untainted when the programmer explicitly invokes a function untaint() on it.
  • b) Any use of a tainted variable in a known dangerous situation raises an error (and that should include writing it to STDOUT).
  • c) You have to call untaint() on all tainted variables before the end of the process, otherwise an error will be raised (this forces you to take notice when you failed to explicitly address the tainting of that variable).

The main difference with taint mode's current assumptions are that in condition (a), the programmer has to do a much more explicit action to signal that a variable is untainted. In (b), we expand the list of unallowed operations to printing to STDOUT, which should catch a lot of cross scripting vulnerabilities. But I think it's dangerous to assume that we can catch all potential dangerous uses of a tainted variable, which is why I think we need to add a (c), to flag all tainted variables, where they are used in a way that is known to be dangerous or not.

Note that even with these modified assumptions, there is still a risk that you will use a tainted variable in a way that is dangerous but does not correspond to a known dangerous use, and that you use that variable before you have explicitly cleaned it and untainted() it.

However, I suspect that (c) would strongly encourage people to clean and untaint() their user intputs as soon as they acquire them, to prevent the tainting from spreading to other variables on which you will also have to invoke untaint().

Another idea I have is that maybe the "proper" way to deal with security in a Perl CGI context, would be to have a class SafeCGI, which would allow the programmer to "declare" what CGI arguments are admissible, and what types of cleanup to carry out on them (ex: HTML entity escaping, shell character escaping, etc...). If you declare an input without specifying cleanup options, then the default options are to apply everything but the kitchen sink.

Then, all you have to do is to make sure that you always go through SafeCGI instead of CGI, to acquire user inputs. In my case, that's easy enough, because all my application's dialog go through a single dialog factory, so there is only one place where CGI arguments are acquired.

The nice things about this approach are:

  • If you add a new CGI argument in a form and forget to declare it, you will know about it.
  • If you forget to specify what cleanup to do on a CGI argument, you will end up cleaning it to the max
  • Malicious users cannot feed non-defined arguments to your application

The only way you can mess up is if you acquire your CGI arguments through the CGI class instead of CGISafe. But if like me you centralize acquisition of CGI arguments in a central place, that's a non-issue.

I looked around for something like this on CPAN, but couldn't find one. Maybe I'll implement it myself...

I am curious to hear what people think about those ideas. Am I completely off the wall here?


Comment on Re^2: Taint mode limitations
Re^3: Taint mode limitations
by davido (Archbishop) on Nov 03, 2012 at 16:25 UTC

    Maybe I'm completely arrogant here, but it seems that taint mode is in fact counterproductive, because it lulls you into thinking that it spots all the places where you forgot to explicitly address (as in "at least think about") malicious inputs. But it doesn't, again because of its poor assumptions (a) and (b).

    Within their respective problem domains, the same could be said for strict, warnings, Perl::Critic, Safe, prove, and so on. The tool is not the problem unless it's inaccurately or inadequately documented. The problem is the misunderstanding, which is hard to eliminate, because people tend not to read the documentation past the point of "getting it to work."

    I think the challenge with respect to user input is one of education. With a constant deluge of new programmers in the field, the near total lack of discussion of security issues in many university CS programs, the low barrier to entry into scripting languages, and the high stakes involved that make malicious attacks profitable for the hacker, we've been fighting it for decades and will probably continue to do so. It's hard enough to get people to use the tools that are available. It's even harder to get them to take the time to understand them. There are plenty of posts on the net where the mantra is reiterated; don't cargo-cult code without understanding it. This is especially true in any type of programming where security if an issue.

    Tools can never guarantee security. They can simply encourage good behavior and good practices. You're correct though; the tools can lull one into a false sense of security. But making them more effective without taking options away from the programmer is quite difficult. There's a fine line between encouraging good behavior, and hampering creativity. I remember an old saying that I'll paraphrase: "If you make it so easy even a fool could use it, only fools will use it."

    This is a healthy discussion, and I'm glad you brought it up. I'm glad that you make the observation that taint mode isn't completely effective. The more attention the subject gets, hopefully the more people will become aware that the programmer has to assume some responsibility for educating herself on safe practices.


    Dave

      Tools can never guarantee security. They can simply encourage good behavior and good practices. You're correct though; the tools can lull one into a false sense of security. But making them more effective without taking options away from the programmer is quite difficult. There's a fine line between encouraging good behavior, and hampering creativity.

      I agree. No tool safeguard can garantee 100% safety, and the safer you try to make it, the more you may hamper the programmer's creativity.

      I guess the point I am trying to make is that taint mode doesn't seem to be hitting the right sweet spot on that continuum. For example, I don't see how forcing programmers to explicity untaint a variable by calling a method called say, untaint(), would take options from them. Yet, it would sure be much safer than assuming that a regexp group matched from a tainted variable is untainted.

      Similarly, I don't see how reporting all tainted variables that have not been explicitly untaint()ed by the end of the process hamper creativity. And that too would be safer than assuming that a tainted variable doesn't have to be untainted unless it's going to be used in a context that we know to be dangerous.

      It seems to me that the current taint mode is really optimized for situations where you are using a large code base that was developed without security in mind. In that situation, what I proposed earlier would probably fire a lot of alarms. Most of those might be false positive where either (a) the tainted variable IS being cleaned up through the use of a regexp match or (b) the tainted variable is never actually being used in a dangerous context. My proposed taint mode would force you to explicitly add a call to untaint() on all those false positive tainted variables, and this may not be palatable for some developers.

      In a situation like this, the current taint mode implementation may be more palatable to some developers, because it automatically deduces that many of those user inputs are in fact OK. But it also lets a lot of false negatives through. For examples, inputs that either have been derived from a tainted variable trhough a group regexp match, but where this regexp match was never intended to clean security threats. Or inputs that are being used in situations that, while not recognized as dangerous by Perl, are indeed dangerous (ex: writing JS code to STDOUT).

      Personally, when dealing with security, I would rather have to deal with lots of false positives and manually label them as being OK, than have lots of false negatives slip through the cracks. I understand that not everyone may have that bias, so maybe the ideal would be for taint mode to be configurable. Those who are bothered by false negatives can choose lenient options, while those who like me are paranoid and want to let as few false negatives through, can choose a more restrictive option.

      I'm surprised that this is not a possibility.

Re^3: Taint mode limitations
by chromatic (Archbishop) on Nov 03, 2012 at 18:24 UTC
    However, I suspect that (c) would strongly encourage people to clean and untaint() their user intputs as soon as they acquire them...

    Regardless of the existence or presence of taint mode, secure applications do this already.

    I understand your argument (reusing capture groups for untainting was a mistake of the premature reuse of a feature), but I don't see the current situation as an onerous burden. Even without taint mode I would still write my code to perform input validation at the edges of the program, just as I handle encoding concerns at IO boundaries.

      I understand your argument (reusing capture groups for untainting was a mistake of the premature reuse of a feature), but I don't see the current situation as an onerous burden.

      That's interesting and is getting me to wonder if I am not worrying about scenarios that don't happen in practice.

      What do you make of the various situations that I outline in this page: http://www.perlmonks.org/?node_id=1002207. Are these things that typically don't happen in practice? And if so, can you explain why that is? Or maybe you don't find that checking for taintedness before every print to STDOUT, every regexp match and every call to a third party library is not onerous, and you feel confident that you never forget to do it?

      BTW: I am not saying this to be sarcastic. I am more than open to the possibillity that I am imagining nightmare scenarios that don't happen in practice. I am also more than open to the possibilities that some programmers are able to think about checking for taintedness before every print, regexp match or third party library call (I'm just not one of them ;-)).

        Well, you're missing the obvious point that taint was designed to avoid DIRTY data from messing up system calls, and tainted data doesn't break print -- A browser/website vulnerable to XSRF is about 11 domains removed from domain of taint -- not taints job

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1002107]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2014-12-20 16:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (96 votes), past polls