http://www.perlmonks.org?node_id=512698

steelrose has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

A quick question - I'm going to institute a search script to search a single directory on a Website. It's just basically going to feed the contents of the text box into a regex: m/<contents>/;

Are there any red flags that might be security concerns that I should check for in the reg ex before feeding it into the match or no? I'd just rather ask and find out now than have something happen in the future that I didn't expect.

Thanks.
If you give a man a fish he will eat for a day.
If you teach a man to fish he will buy an ugly hat.
If you talk about fish to a starving man, you're a consultant.

Replies are listed 'Best First'.
Re: Site Search perlscript and security
by szbalint (Friar) on Nov 29, 2005 at 16:52 UTC

    To be totally honest with you, I wouldn't allow users to use full blown regexp search (even if they don't know explicitly about it) because it's an overkill in my opinion.

    One of the other concerns could be sending the regexp engine into a never ending loop with a malicious regexp, it can be done, that's an attack vector for a (D)DOS.

      Agree. And it's not just the malicious. Someone earnestly attempting to write a useful regular expression can unintentionally or accidentally write one that will tie up that server process as long as it's allowed to run/live.

Re: Site Search perlscript and security
by hardburn (Abbot) on Nov 29, 2005 at 17:09 UTC

    Just for starters, ??{ . . . } can be used to execute code. That would be bad. There's also nothing stopping future versions of Perl to add some other way to execute code or do other nasty things if you let the search string go through. So even if you're safe now, you might not be safe when you upgrade Perl a few years from now.

    You absolutely need to have a deny-by-default policy. Run the string through something like this before searching:

    if( $search !~ /\A ([A-Za-z0-9 ]+) \z/x ) { print "Error, can't run search\n"; }

    You can add in as many characters as you need, but I suspect most searches don't need anything more than ASCII upper- and lower-case, numbers, and a space.

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

      This is exactly what my concern was (though admittedly, I didn't realize you could actually run code in a regex...)

      It's just a simple text search, so I should have success using a form of your if statment. I think what I'll do though is strip the string of any non A-Z a-z 0-9 and space, then use that string to feed the regex.

      I'll play around with it (I'm still learning about regex's) and see what I can come up with. In the mean time, if anyone has a good solution and wants to post it, I'll check back later to see how my solution compares. Thanks.

      If you give a man a fish he will eat for a day.
      If you teach a man to fish he will buy an ugly hat.
      If you talk about fish to a starving man, you're a consultant.
      Just for starters, ??{ . . . } can be used to execute code. That would be bad. There's also nothing stopping future versions of Perl to add some other way to execute code or do other nasty things if you let the search string go through. So even if you're safe now, you might not be safe when you upgrade Perl a few years from now.
      Just to note, this is only true if you specifically enable it via use re 'eval';. As for perl suddenly allowing a new way to interpolate code in to a regex and execute it, this is really rather unlikely. If they do, it will definitely only work with a new, specific switch.
      So, my solution:

      $string =~ s/((?![\w,\s])|(?=[_,\,])).//g;

      then do the match using the string data. And of course, print a disclaimer on the page with the text box for the users that special characters will be ignored ;)

      If you give a man a fish he will eat for a day.
      If you teach a man to fish he will buy an ugly hat.
      If you talk about fish to a starving man, you're a consultant.

        IMHO, \w and \s are too liberal in what they accept. Chances are that your search will not need Unicode, and \w in particular is going to accept that if your perl has Unicode support. Unless you know you need Unicode, it's probably better to use the explicit character class [A-Za-z0-9].

        "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.