Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^3: Taint mode limitations

by AnomalousMonk (Archbishop)
on Nov 03, 2012 at 15:50 UTC ( [id://1002111]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Taint mode limitations
in thread Taint mode limitations

... [let] me tell it when I think I have looked at it closely enough (for example, [by] invoking a method untainted() on a variable) ...

But how would you "look at it" in the first place? Almost always by a regex match of some kind. So one would wind up with a statement like
    untaint($hinky) if my @safe = $hinky =~ m{ \A now (get) some (stuff) here \z }xms;
    then_do_safe_stuff_with($hinky, @safe);  # $hinky now safe, too

But what is to be gained by making explicitly required an action that is already implicit in the successful regex match? Everything still depends on crafting an effective validation regex.

Replies are listed 'Best First'.
Re^4: Taint mode limitations
by alain_desilets (Beadle) on Nov 03, 2012 at 17:29 UTC
    But what is to be gained by making explicitly required an action that is already implicit in the successful regex match? Everything still depends on crafting an effective validation regex.

    The problem is that regexp matches are typically used to do a lot of different things, and removing malicious characters is only one of them. So assuming that a variable derived from a tainted variable through a regexp match is "clean" is dangerous.

    For example, I have a fairly large code base that I wrote before I became concerned about security issues. In this code base, there are plenty of places where I capture regexp groups on user inputs for reasons that have nothing to do whatsover with removing malicious characters. For example, there are many places where I use regexps to strip out the leading and trailing characters of a user input. As a result, all those strings will be considered kosher by taint mode. In contrast, if taint mode forced me to explicitly label a variable as being untainted, those cases would be correctly identified as being currently tainted.

    I'm not clutching at straws here. This is a real situation, and I am sure there are plenty of folks who have examples of this problem in their code (and I bet this includes a lot of folks who run taint mode).

      ... I have a fairly large code base that I wrote before I became concerned about security issues. In this code base, there are plenty of places where I capture regexp groups on user inputs for reasons that have nothing to do whatsover with [validation].

      You have code written without concern for security. The main body of this code operates freely on input, including using regexes. The code must now be re-written to take security into account. Given the nature of the code that I infer from your description, there is no way to avoid a major re-write of some kind.

      Speaking in the most general terms, it seems to me that some new layer of validation code must be interposed between all input and existing operations on that input. Within that layer, input must be tested (presumably with regexes), and then either implicitly or explicitly untainted. If any input is allowed to reach the existing processing code, you have a security problem. The hermeticity of the new validation layer is the main problem; it seems to make little difference if the untainting done within it is implicit or explicit.

      Update: I just went back and reviewed this thread and saw BrowserUk's reply. I seem to be repeating many of the points made therein, and I don't disagree with those I don't repeat. I sympathize with your desire for a mechanism that when activated would 'light up' the application for any input data not explicitly untainted, but that would not address the basic problem, common to both the current taint mechanism and the one you propose, of designing an effective test for each datum within a newly-designed validation layer. Caveat Programmor.

        You have code written without concern for security. The main body of this code operates freely on input, including using regexes. The code must now be re-written to take security into account. Given the nature of the code that I infer from your description, there is no way to avoid a major re-write of some kind.

        Your inference is wrong ;-). Although I have never had to worry about security up to now (I mostly work on proof of concept research demos), I do think a lot about design and modularity (obssessively so if you ask my colleagues ;-)).

        In this particular case, all CGI pages generated by my app are implemented as Perl classes that all derive from a common root class. This root class creates a CGI object and stores it in an an attribute $self->{_cgi}, and all the subclass use it to acquire user inputs (or at least, they are supposed to do it that way).

        So it would be fairly easy for me to do what you suggest, by adding a central sanitation of all cgi inputs in that root class (see the end of this page for how I plan to do this: http://www.perlmonks.org/?node_id=1002107).

        This will go a long way towards ensuring that all CGI inputs are being sanitized. But it still possible for me or one of my colleagues to forgetting to use the $self->{_cgi} attribute and instead creating a new CGI object inside one of the classes or methods that needs access to the user inputs. In fact, I do know that this has happened to me at least once this year.

        So I will still put taint mode on, but as I pointed out here: http://www.perlmonks.org/?node_id=1002207, this still won't catch instances where the method that bypasses $self->{_cgi} writes to STDOUT, or cases where it ends up creating a tainted variable that gets inadvertantly untained by someone else down the road (possibly in a third party library).

        Yet, it COULD be avoided if taint mode was less lenient, as I suggest in the middle of this page. http://www.perlmonks.org/?node_id=1002107.

        As several people have pointed out, this is not likely to happen anytime soon. So I guess I will just have to be content with me and my colleagues being extra careful to always acquire user input through the root class' $self->{_cgi} argument.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1002111]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-03-29 05:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found