in reply to Compiling Regular Expressions

When perl compiles your program, any constant regular expressions in the code (ie any that don't interpolate any variables) are compiled. Any that do interpolate variables will be compiled once when they are first used, and on further uses are re-compiled only if they have changed since the last use.

So in this code, the regular expression is compiled only once (at compile time):

for (@words) { print if /foo/; }

And also in this code (but at runtime):

my $s = "foo"; for (@words) { print if /$s/; }

In this code, the two regular expressions are each compiled only once:

my($s1, $s2) = qw/ foo bar /; for (@words) { print if /$s1/; print if /$s2/; }

But in this code, because the one regular expression is being used alternately interpolated with "foo" and "bar", it will be recompiled each time:

my @s = qw/ foo bar /; for (@words) { for my $s (@s) { print if /$s/; } }

You can get information about what perl is doing with your regexps using the -Dr switch if perl has been compiled for debugging, or with use re 'debug';. This produces a lot of scary-looking information about both compiling and running regular expressions, but if you ignore that and just look at the "Compiling ..." lines it will show you what is happening.

The last case above is where qr{} comes in: if we fill @s with qr{}'d expressions instead, we can avoid the recompile:

my @s = map qr{$_}, qw/ foo bar /; for (@words) { for my $s (@s) { print if /$s/; } }

Note that this only helps if you are using exactly the compiled regexp: if you interpolate it into something more, even if only to add an anchor, it will go back to recompiling each time.


Replies are listed 'Best First'.
Re^2: Compiling Regular Expressions
by johndeighan (Novice) on Mar 11, 2016 at 16:05 UTC

    I want to embed Perl code in my regex using (?{...}), so it's critical that the regex is compiled at compile time. I've gotten this to work without errors or warnings:

    my $kind; my $REGEX = qr/ [A-Za-z][\w]* (?{$kind = 'IDENT';}) | (?: ==? | != | <=? | >=? ) (?{$kind = 'OP';}) | -?\d+ (?{$kind = 'INT';}) | \x27 ( (?:[^\x27] | \x27{2})* ) \x27 (?{$kind = 'STRING';}) | \S (?{$kind = 'OTHER';}) /xs;

    However, I'd like to better organize the regex by splitting it into parts, then using those parts. This is what I'd like to do, but when I try, I get the error "Eval-group not allowed at runtime, use re 'eval' in regex". I think that the suggested workaround is bad practice, so I won't do it, but I don't understand why what I'm trying to do won't work since all the regex's involved use qr//. I've also tried using Readonly for the parts, but Perl still doesn't recognize that the parts I'm using will never change. Is there any other way to get $REGEX to compile at compile time while breaking it into parts?

    my $IDENT = qr/ [A-Za-z][\w]* /xs; my $STRING = qr/ \x27 ( (?:[^\x27] | \x27{2})* ) \x27 /xs; my $OP = qr/ (?: ==? | != | <=? | >=? ) /xs; my $INT = qr/ -?\d+ /xs; my $kind; my $REGEX = qr/ $IDENT (?{$kind = 'IDENT';}) | $OP (?{$kind = 'OP';}) | $INT (?{$kind = 'INT';}) | $STRING (?{$kind = 'STRING';}) | \S (?{$kind = 'OTHER';}) /xs;

      You must be using a version of perl that is less than v5.18, as this behaviour works for me on 5.18+, and is documented as such in v5.18 perldelta... search for "The use re 'eval' pragma".