Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

untainting regex input

by rastoboy (Monk)
on Aug 23, 2013 at 02:28 UTC ( #1050581=perlquestion: print w/replies, xml ) Need Help??
rastoboy has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Brothers,

I'm writing a simple text search CGI script. The user types in a search term, and I perl grep through a datastructure of the text and return data.

At the moment I untaint the user input by allowing only "word like" characters and such. However, I'd like to allow the users to use regular expressions in their searches. However, I am not enough of a regex master to know what to allow/disallow. I've been told that you can execute code in a regex, so that makes me nervous about accepting any regex.

Is there a tool or any hints as to how I could safely allow this? Any input would be greatly appreciated!


Replies are listed 'Best First'.
Re: untainting regex input
by kennethk (Abbot) on Aug 23, 2013 at 02:38 UTC
    In my own work in this context, I far prefer the use of wildcards to explicit regular expressions; you can handle the specific case of interest, and use quotemeta to clean the rest. It's in line with my web-context philosophy of white listing things rather than blacklisting.

    With regards to your actual question, discussion of the feature in question is A bit of magic: executing Perl code in a regular expression. I would say, from a practical standpoint, as long as your parentheses get escaped, you should be safe, since this also blocks some potential denial-of-service via infinite recursion. But given the potential security mess here, I'd be very cautious, especially if this is outward facing. What benefits are you seeking via regex that wildcards are lacking?

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: untainting regex input
by zork42 (Monk) on Aug 23, 2013 at 05:57 UTC
    You should open perlretut and search for 'taint'

    eg A bit of magic: executing Perl code in a regular expression says:
    If the $regexp variable contains a code expression, the user could then execute arbitrary Perl code. For instance, some joker could search for system('rm -rf *'); to erase your files. In this sense, the combination of interpolation and code expressions taints your regexp. So by default, using both interpolation and code expressions in the same regexp is not allowed. If you're not concerned about malicious users, it is possible to bypass this security check by invoking use re 'eval' :
Re: untainting regex input
by Anonymous Monk on Aug 23, 2013 at 07:21 UTC

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1050581]
Approved by kevbot
Front-paged by kevbot
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2017-07-26 06:52 GMT
Find Nodes?
    Voting Booth?
    I came, I saw, I ...

    Results (384 votes). Check out past polls.