http://www.perlmonks.org?node_id=11137302


in reply to Re^2: Is it safe to use external strings for regexes?
in thread Is it safe to use external strings for regexes?

> So if the pattern is read from a file or database this isn't an issue.

As I said "In the latter case" of general vulnerabilities, these are some issues to be aware of.

The OP said

> > These regexes are in the dozens, and are scattered across several scripts and libraries.

> > maintenance of these mappings is easier.

I doubt the general case can be solved with a DB of simple strings. Maintainable regexes are composed of smaller ones by interpolation and dynamic compilation. Which brings us back to start.

> is only allowed within the scope of use re 'eval';

with "newer" Perls yes. I noticed that you changed it around 2013, and am thankful for that. *

> The third one is a genuine issue, in terms of both CPU and memory usage.

well some regex engines optimize sometimes better than Perl's.

I remember a demo of a case with nested quantifiers where unix' grep did very well and Perl waited for the end of times.

This could be eased by analyzing the regex for potential traps like listed here and warning accordingly.

This analyze could be done by parsing the compilation with re 'debug';

But again this could open the door for those general vulnerabilities, that's why I prefer to point to them.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

) for completeness TheDamian published a static parser for perl regexes, I can't tell how closely it incorporates new features.

*) Some IDEs do perl -c on default when they open a perl file. Sending a troyan script with a evil BEGIN block will execute instantly after opening. And obfuscation with Acme::EyeDrops will still allow hiding the evil logic into a regex, one just needs to add use re 'eval'; for newer Perls

Replies are listed 'Best First'.
Re^4: Is it safe to use external strings for regexes?
by dave_the_m (Monsignor) on Oct 07, 2021 at 15:26 UTC
    > is only allowed within the scope of use re 'eval'; with "newer" Perls yes. I noticed that you changed it around 2013, and am thankful for that. *
    Um no, "use re 'eval'" has always been required to allow non-literal code blocks in patterns. The big "re eval" rewrite in 5.18.0 just made it smarter, so that for example a literal (and thus safe) code block could be interpolated into a run-time regex without needing the "use re 'eval'":
    use re 'eval'; # ** no longer needed from 5.18.0 onwards $r = qr/xyz/; /(?{ foo() })$r/;

    Dave.

      > just made it smarter ... without needing the "use re 'eval'":

      hm, we seem to be talking about different things

      please compare these threads

      They show that concatenating literal strings to form an eval group used to work out of the box without use re 'eval' .

      Both examples don't include any variables but rely on concats aka .

      In detail: Some optimizer converted this

      •  ''=~ ( '(?{B' . 'EGIN{print "owned"}})' )
      into this at compile time
      •  ''=~ '(?{BEGIN{print "owned"}})'
      without complaining. Now it requires use re "eval"

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        hm, we seem to be talking about different things
        I thought we were talking about using strings obtained from an external source (such as a file or DB) as a regex, and whether the (?{...}) feature could be exploited in that case. The example you gave of concatting two halves of a regex still requires the code to be literal in the source (albeit split) to not need 'use re eval' in the src code, even prior to 5.18.0.

        Dave.