Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Precompiling substitution regex

by FreakyGreenLeaky (Sexton)
on Jun 17, 2011 at 11:09 UTC ( #910116=perlquestion: print w/replies, xml ) Need Help??
FreakyGreenLeaky has asked for the wisdom of the Perl Monks concerning the following question:


I'd like to precompile substitution regular expressions in a similar way to precompiling match regexes:
my @MATCH_RE = ( q%(?:this|or|that)%, ); my @MATCH_REC = map { qr/$_/ } @MATCH_RE; sub doit { my($str) = @_; for my $re (@MATCH_REC) { return 0 if $str =~ m%$re%i; } return 1; }

The above provides the performance gains as expected and works well.

Is it possible to do the same thing for substitution where the replacement string and modifiers can vary, but are hard-coded? For example (incorrect code, I know):
my @SUB_RE = ( q@%(?:this|or|that)%bob%gsi@, q@%(?:One|Two|Three)%Four%gs@, # note different modifier ); # or, with 's' embedded my @SUB_RE2 = ( q@s%(?:this|or|that)%bob%gsi@, q@s%(?:One|Two|Three)%Four%gs@, # note different modifier ); my @SUB_REC = map { qr/$_/ } @SUB_RE; sub doit { my($str) = @_; for my $re (@SUB_REC) { $str =~ $re; } return $str; }
Any pointers would be appreciated.

Replies are listed 'Best First'.
Re: Precompiling substitution regex
by moritz (Cardinal) on Jun 17, 2011 at 11:44 UTC
    You can interpolate precompiled regexes into substitutions as well. The different modifiers can go into each regex:
    my @SUB_REC = ( qr{(?:this|or|that)}si, qr{(?:One|Two|Three)}s, ); my @SUBSTITUTION = qw/bob four/; for my $idx (0..$#SUB_REC) { $str =~ s/$SUB_REC[$idx]/$SUBSTITUTION[$idx]/; }
      Thanks moritz, that will work. Is there any way of achieving that without having the substitutions in a separate array?
        Well, you could of course store an array of closures that each does a substitution, and call those in turn - but it might slow down things again. Or you could store regexes and substitutions in the same array, but distinguished by different indexes (even/odd or first half/second half).

        But I don't think you can easily store a precompiled whole substitution in a single scalar.

Re: Precompiling substitution regex
by wind (Priest) on Jun 17, 2011 at 22:49 UTC

    Use eval to cache the substitution in an anonymous sub:

    use strict; use warnings; # Cached Regex's: LHS, RHS, Modifiers my @re_rules = ( ['(?:this|or|that)', 'bob', 'gsi'], ['(?:One|Two|Three)', 'Four', 'gs'], ); my @re_subs = map { my $sub = eval "sub { s/$_->[0]/$_->[1]/$_->[2] for (\@_)}"; die $@ if $@; $sub; } @re_rules; my @strings = ('this one or that other', 'One two Three'); for my $string (@strings) { $_->($string) for (@re_subs); print "$string\n"; }
    Or just declare the anonymous subs yourself if you don't need the special markup:
    use strict; use warnings; my @re_subs = ( sub { s/(?:this|or|that)/bob/gsi for (@_) }, sub { s/(?:One|Two|Three)/Four/gs for (@_) }, ); my @strings = ('this one or that other', 'One two Three'); for my $string (@strings) { $_->($string) for (@re_subs); print "$string\n"; }
      Thanks. Does the second sample pre-compile the regexes?

        I believe that we're mixing terminology here, as "pre-compile" isn't exactly what you're wanting to ask. Yes, the code is only compiled once and after that point it will not need to be recompiled.

        I've used this type of mechanism to program configurable filters. Letting a user edit a config file where they specify anonymous subs like that which are later eval'd and used in my larger package.

      old thread but wanted to ask what
      my $sub =eval "sub { s/$_->[0]/$_->[1]/$_->[2] for (\@_)}";
      does and what would be different if eval was removed?

        In an eval evironment in which  @_ was, e.g.,
            @_ = ('searchPattern', 'replacementString', 'regexModifiers');
        (presumably, the eval statement is called in the context of a function to which arguments were passed via @_), the eval would return a code reference equivalent to
            sub { s/searchPattern/replacementString/regexModifiers for @_ }
        which would allow one to iterate over a list of strings and do substitutions on each (non-literal!) string:

        my $x = 'some'; my $y = 'strings'; my $z = 'here'; $sub->($x, $y, $z);

        The only benefit that I can see of using the string eval is that the regex | substitution operator modifiers  /g /e /r and perhaps a couple others cannot be "passed" in any other way. If not for these modifiers, one could simply write something like (again, assuming this is in a function in which  @_ held search/replace strings)
            my $sub = sub { s/$_->[0]/$_->[1]/xmsg for @_ };
            $sub->($x, $y, $z);
        (note the literal modifier group) and get the same result. In this example, the modifiers other than  /g could be passed in a string.

        Update: E.g.,

        c:\@Work\Perl\monks>perl -wMstrict -le "S('(?xmsi) foo', 'bar', 'g'); ;; sub S { my $sub = eval qq{ sub { s/$_[0]/$_[1]/$_[2] for \@_ } }; ;; my $x = 'foo'; my $y = 'Foo fOo foO'; my $z = 'FOO'; ;; $sub->($x, $y, $z); print qq{x '$x' y '$y' z '$z'}; } " x 'bar' y 'bar bar bar' z 'bar'
        And yes, this is a trick for a completely trusted environment. But then: "Trust, but verify!"

        Give a man a fish:  <%-{-{-{-<

        what do you observe happening when you remove eval?
Re: Precompiling substitution regex
by d00mvw (Initiate) on Jul 02, 2014 at 13:17 UTC

    Newbie to Perl, but just found myself in similar circumstances. For what it's worth, I'm using a two dimensional array with map, i.e.,

    my @SUB_REC = map {[qr{$_->[0]}, $_->[1]]} ( ['(robert|bobby)', 'bob'] );

    where the first element of each row is the regex and the second element is the substitution. Then simply use

    foreach (@SUB_REC) { $yourText =~ s/$_->[0]/$_->[1]/g; }

    Obviously you can add more elements to the array for flags.

      ['(robert|bobby)', 'bob']

      Your example regex has a capture group, but the subsequent substitution doesn't seem to offer any opportunity to use what was captured. Is there any reason for capturing?

        No, that's a typo. Interesting - is there a way to pass captured groups to be substituted instead of straight strings?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://910116]
Approved by moritz
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2018-05-21 06:16 GMT
Find Nodes?
    Voting Booth?