http://www.perlmonks.org?node_id=1001997

alain_desilets has asked for the wisdom of the Perl Monks concerning the following question:

I'm a total newb when it comes to security, and I am trying to figure out how to use taint mode to locate and patch vulnerabilities in a Perl CGI application.

I have read a bit about taint mode, and I am puzzled by two things that don't seem to make sense. So I am wondering if I am just misunderstanding something.

Firstly, it seems that when I apply a regexp match to a tainted variable, then Perl considers the matched groups to NOT be tainted. The assumption being that I executed this regexp match specifically to remove potential threats from the tainted sting.

This seems to be a very naive assumption. For example, it is very common for me to remove leading and trailing space from a text input, before proceeding ahead with it. From then on, Perl would consider that the input is untainted, eventhough the regexp match I applied to it has nothing to do with removing potential threats from it.

When I first read about taint mode, I assume that there would be some kind of function untainted(), which would label a particular variable as having been untainted. But there doesn't seem to be a way to do that.

The other thing I notice is that Perl only considers tainted variable as being dangerous when used in system calls and the like. But there are many other situations where using a tainted variable is dangerous. For example, if I write some JavaScript code to my CGI scripts STDOUT, and that code is composed based on the content of a tainted variable, then I might be opening my application to a cross site scripting attack. Yet, Perl considers that priting a tainted value to STDOUT is safe.

Again, when I first started reading about taint mode, I expected that it would identify every single instance of tainted variable and force me to look at it explicitly. But instead, it only flags the use of tainted variables in contexts where Perl thinks it might be unsafe to use, and obviously, Perl's idea of what is safe is too permissive. Is there a way to tell taint mode to identify all tainted variables, whether they are used in an unsafe fashion or not?

Thanks. Alain