http://www.perlmonks.org?node_id=910116

FreakyGreenLeaky has asked for the wisdom of the Perl Monks concerning the following question:

Greetings,

I'd like to precompile substitution regular expressions in a similar way to precompiling match regexes:
my @MATCH_RE = ( q%(?:this|or|that)%, ); my @MATCH_REC = map { qr/$_/ } @MATCH_RE; sub doit { my($str) = @_; for my $re (@MATCH_REC) { return 0 if $str =~ m%$re%i; } return 1; }

The above provides the performance gains as expected and works well.

Is it possible to do the same thing for substitution where the replacement string and modifiers can vary, but are hard-coded? For example (incorrect code, I know):
my @SUB_RE = ( q@%(?:this|or|that)%bob%gsi@, q@%(?:One|Two|Three)%Four%gs@, # note different modifier ); # or, with 's' embedded my @SUB_RE2 = ( q@s%(?:this|or|that)%bob%gsi@, q@s%(?:One|Two|Three)%Four%gs@, # note different modifier ); my @SUB_REC = map { qr/$_/ } @SUB_RE; sub doit { my($str) = @_; for my $re (@SUB_REC) { $str =~ $re; } return $str; }
Any pointers would be appreciated.

Replies are listed 'Best First'.
Re: Precompiling substitution regex
by moritz (Cardinal) on Jun 17, 2011 at 11:44 UTC
    You can interpolate precompiled regexes into substitutions as well. The different modifiers can go into each regex:
    my @SUB_REC = ( qr{(?:this|or|that)}si, qr{(?:One|Two|Three)}s, ); my @SUBSTITUTION = qw/bob four/; for my $idx (0..$#SUB_REC) { $str =~ s/$SUB_REC[$idx]/$SUBSTITUTION[$idx]/; }
      Thanks moritz, that will work. Is there any way of achieving that without having the substitutions in a separate array?
        Well, you could of course store an array of closures that each does a substitution, and call those in turn - but it might slow down things again. Or you could store regexes and substitutions in the same array, but distinguished by different indexes (even/odd or first half/second half).

        But I don't think you can easily store a precompiled whole substitution in a single scalar.

Re: Precompiling substitution regex
by wind (Priest) on Jun 17, 2011 at 22:49 UTC

    Use eval to cache the substitution in an anonymous sub:

    use strict; use warnings; # Cached Regex's: LHS, RHS, Modifiers my @re_rules = ( ['(?:this|or|that)', 'bob', 'gsi'], ['(?:One|Two|Three)', 'Four', 'gs'], ); my @re_subs = map { my $sub = eval "sub { s/$_->[0]/$_->[1]/$_->[2] for (\@_)}"; die $@ if $@; $sub; } @re_rules; my @strings = ('this one or that other', 'One two Three'); for my $string (@strings) { $_->($string) for (@re_subs); print "$string\n"; }
    Or just declare the anonymous subs yourself if you don't need the special markup:
    use strict; use warnings; my @re_subs = ( sub { s/(?:this|or|that)/bob/gsi for (@_) }, sub { s/(?:One|Two|Three)/Four/gs for (@_) }, ); my @strings = ('this one or that other', 'One two Three'); for my $string (@strings) { $_->($string) for (@re_subs); print "$string\n"; }

      FWIW, note that in the second example here, the outer for-loop is not needed:

      c:\@Work\Perl\monks>perl -wMstrict -le "my @re_subs = ( sub { s/(?:this|or|that)/bob/gsi for @_ }, sub { s/(?:One|Two|Three)/Four/gs for @_ }, ); ;; my @strings = ('tHiS one or ThAt other', 'One two Three'); printf qq{'$_' } for @strings; print ''; ;; $_->(@strings) for @re_subs; printf qq{'$_' } for @strings; print ''; " 'tHiS one or ThAt other' 'One two Three' 'bob one bob bob other' 'Four two Four'
      That's because of aliasing through the  @_ function argument array. Actually, the work of the outer for-loop is just being moved to the for-loop within each anonymous subroutine, which is now given something to loop over whereas before it was just being used for topicalization.


      Give a man a fish:  <%-{-{-{-<

      Thanks. Does the second sample pre-compile the regexes?

        I believe that we're mixing terminology here, as "pre-compile" isn't exactly what you're wanting to ask. Yes, the code is only compiled once and after that point it will not need to be recompiled.

        I've used this type of mechanism to program configurable filters. Letting a user edit a config file where they specify anonymous subs like that which are later eval'd and used in my larger package.

      old thread but wanted to ask what
      my $sub =eval "sub { s/$_->[0]/$_->[1]/$_->[2] for (\@_)}";
      does and what would be different if eval was removed?

        In an eval evironment in which  @_ was, e.g.,
            @_ = ('searchPattern', 'replacementString', 'regexModifiers');
        (presumably, the eval statement is called in the context of a function to which arguments were passed via @_), the eval would return a code reference equivalent to
            sub { s/searchPattern/replacementString/regexModifiers for @_ }
        which would allow one to iterate over a list of strings and do substitutions on each (non-literal!) string:

        my $x = 'some'; my $y = 'strings'; my $z = 'here'; $sub->($x, $y, $z);

        The only benefit that I can see of using the string eval is that the regex | substitution operator modifiers  /g /e /r and perhaps a couple others cannot be "passed" in any other way. If not for these modifiers, one could simply write something like (again, assuming this is in a function in which  @_ held search/replace strings)
            my $sub = sub { s/$_->[0]/$_->[1]/xmsg for @_ };
            ...
            $sub->($x, $y, $z);
        (note the literal modifier group) and get the same result. In this example, the modifiers other than  /g could be passed in a string.

        Update: E.g.,

        c:\@Work\Perl\monks>perl -wMstrict -le "S('(?xmsi) foo', 'bar', 'g'); ;; sub S { my $sub = eval qq{ sub { s/$_[0]/$_[1]/$_[2] for \@_ } }; ;; my $x = 'foo'; my $y = 'Foo fOo foO'; my $z = 'FOO'; ;; $sub->($x, $y, $z); print qq{x '$x' y '$y' z '$z'}; } " x 'bar' y 'bar bar bar' z 'bar'
        And yes, this is a trick for a completely trusted environment. But then: "Trust, but verify!"


        Give a man a fish:  <%-{-{-{-<

        what do you observe happening when you remove eval?
Re: Precompiling substitution regex
by d00mvw (Initiate) on Jul 02, 2014 at 13:17 UTC

    Newbie to Perl, but just found myself in similar circumstances. For what it's worth, I'm using a two dimensional array with map, i.e.,

    my @SUB_REC = map {[qr{$_->[0]}, $_->[1]]} ( ['(robert|bobby)', 'bob'] );

    where the first element of each row is the regex and the second element is the substitution. Then simply use

    foreach (@SUB_REC) { $yourText =~ s/$_->[0]/$_->[1]/g; }

    Obviously you can add more elements to the array for flags.

      ['(robert|bobby)', 'bob']

      Your example regex has a capture group, but the subsequent substitution doesn't seem to offer any opportunity to use what was captured. Is there any reason for capturing?

        No, that's a typo. Interesting - is there a way to pass captured groups to be substituted instead of straight strings?
      Looping misses the point of the other answers, which is to execute just one regex that matches all the patterns.