Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Precompiling substitution regex

by FreakyGreenLeaky (Sexton)
on Jun 17, 2011 at 11:09 UTC ( #910116=perlquestion: print w/ replies, xml ) Need Help??
FreakyGreenLeaky has asked for the wisdom of the Perl Monks concerning the following question:


I'd like to precompile substitution regular expressions in a similar way to precompiling match regexes:
my @MATCH_RE = ( q%(?:this|or|that)%, ); my @MATCH_REC = map { qr/$_/ } @MATCH_RE; sub doit { my($str) = @_; for my $re (@MATCH_REC) { return 0 if $str =~ m%$re%i; } return 1; }

The above provides the performance gains as expected and works well.

Is it possible to do the same thing for substitution where the replacement string and modifiers can vary, but are hard-coded? For example (incorrect code, I know):
my @SUB_RE = ( q@%(?:this|or|that)%bob%gsi@, q@%(?:One|Two|Three)%Four%gs@, # note different modifier ); # or, with 's' embedded my @SUB_RE2 = ( q@s%(?:this|or|that)%bob%gsi@, q@s%(?:One|Two|Three)%Four%gs@, # note different modifier ); my @SUB_REC = map { qr/$_/ } @SUB_RE; sub doit { my($str) = @_; for my $re (@SUB_REC) { $str =~ $re; } return $str; }
Any pointers would be appreciated.

Comment on Precompiling substitution regex
Select or Download Code
Re: Precompiling substitution regex
by moritz (Cardinal) on Jun 17, 2011 at 11:44 UTC
    You can interpolate precompiled regexes into substitutions as well. The different modifiers can go into each regex:
    my @SUB_REC = ( qr{(?:this|or|that)}si, qr{(?:One|Two|Three)}s, ); my @SUBSTITUTION = qw/bob four/; for my $idx (0..$#SUB_REC) { $str =~ s/$SUB_REC[$idx]/$SUBSTITUTION[$idx]/; }
      Thanks moritz, that will work. Is there any way of achieving that without having the substitutions in a separate array?
        Well, you could of course store an array of closures that each does a substitution, and call those in turn - but it might slow down things again. Or you could store regexes and substitutions in the same array, but distinguished by different indexes (even/odd or first half/second half).

        But I don't think you can easily store a precompiled whole substitution in a single scalar.

Re: Precompiling substitution regex
by wind (Priest) on Jun 17, 2011 at 22:49 UTC

    Use eval to cache the substitution in an anonymous sub:

    use strict; use warnings; # Cached Regex's: LHS, RHS, Modifiers my @re_rules = ( ['(?:this|or|that)', 'bob', 'gsi'], ['(?:One|Two|Three)', 'Four', 'gs'], ); my @re_subs = map { my $sub = eval "sub { s/$_->[0]/$_->[1]/$_->[2] for (\@_)}"; die $@ if $@; $sub; } @re_rules; my @strings = ('this one or that other', 'One two Three'); for my $string (@strings) { $_->($string) for (@re_subs); print "$string\n"; }
    Or just declare the anonymous subs yourself if you don't need the special markup:
    use strict; use warnings; my @re_subs = ( sub { s/(?:this|or|that)/bob/gsi for (@_) }, sub { s/(?:One|Two|Three)/Four/gs for (@_) }, ); my @strings = ('this one or that other', 'One two Three'); for my $string (@strings) { $_->($string) for (@re_subs); print "$string\n"; }
      Thanks. Does the second sample pre-compile the regexes?

        I believe that we're mixing terminology here, as "pre-compile" isn't exactly what you're wanting to ask. Yes, the code is only compiled once and after that point it will not need to be recompiled.

        I've used this type of mechanism to program configurable filters. Letting a user edit a config file where they specify anonymous subs like that which are later eval'd and used in my larger package.

Re: Precompiling substitution regex
by d00mvw (Initiate) on Jul 02, 2014 at 13:17 UTC

    Newbie to Perl, but just found myself in similar circumstances. For what it's worth, I'm using a two dimensional array with map, i.e.,

    my @SUB_REC = map {[qr{$_->[0]}, $_->[1]]} ( ['(robert|bobby)', 'bob'] );

    where the first element of each row is the regex and the second element is the substitution. Then simply use

    foreach (@SUB_REC) { $yourText =~ s/$_->[0]/$_->[1]/g; }

    Obviously you can add more elements to the array for flags.

      ['(robert|bobby)', 'bob']

      Your example regex has a capture group, but the subsequent substitution doesn't seem to offer any opportunity to use what was captured. Is there any reason for capturing?

        No, that's a typo. Interesting - is there a way to pass captured groups to be substituted instead of straight strings?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://910116]
Approved by moritz
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (9)
As of 2015-01-27 08:49 GMT
Find Nodes?
    Voting Booth?

    My top resolution in 2015 is:

    Results (198 votes), past polls