Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

capture global substitution in an array

by wrinkles (Pilgrim)
on Apr 29, 2012 at 16:32 UTC ( #967949=perlquestion: print w/replies, xml ) Need Help??
wrinkles has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

Movable Type has a "regex_replace" tag modifier that takes two arguments, a regular expression, and a replacement expression, and returns the modified string.

My problem is that I'd like to write a plugin that returns an array of the individual modified strings instead of the whole string. I thought that would be easy. And it may be, but not for me. The challenge is that while the match regex operator can return an array, the same is not true for substitutions. So while the following returns matches, the behaviour is not the same for substitutions.

@matches = $text =~ /$regex/smg)

Here's the original code for MT's "regex_replace" modifier. I'd offer my attempts, but really I tried a lot of things and just got nowhere. I end up just getting the number of substitutions whereas I'd like to capture the substitutions themselves.


=head2 regex_replace Applies a regular expression operation on the input. This filter accep +ts B<two> input values: one is the pattern, the second is the replacement +. B<Example:> <$mt:EntryTitle regex_replace="/\s*\[.+?\]\s*$/",""$> This would strip any bracketed phrase from the end of the entry title field. =cut sub _fltr_regex_replace { my ( $str, $val, $ctx ) = @_; # This one requires an array return $str unless ref($val) eq 'ARRAY'; my $patt = $val->[0]; my $replace = $val->[1]; if ( $patt =~ m!^(/)(.+)\1([A-Za-z]+)?$! ) { $patt = $2; my $global; if ( my $opt = $3 ) { $global = 1 if $opt =~ m/g/; $opt =~ s/[ge]+//g; $patt = "(?$opt)" . $patt; } my $re = eval {qr/$patt/}; if ( defined $re ) { $replace =~ s!\\\\(\d+)!\$1!g; # for php, \\1 is how you +write $1 $replace =~ s!/!\\/!g; eval '$str =~ s/$re/' . $replace . '/' . ( $global ? 'g' : + '' ); if ($@) { return $ctx->error("Invalid regular expression: $@"); } } } return $str; }
(update: corrected title spelling)

Replies are listed 'Best First'.
Re: capture global substituion inot an array
by moritz (Cardinal) on Apr 29, 2012 at 16:37 UTC

    What exactly do you mean by "capture the substitutions"?

    A substitution is an action, not something you can express in a perl scalar. Only its return value and the result of the string it acted upon are easily captured.

      Yes, I suspected that the real problem lies in my misunderstanding. I imagined that there is a mythical "replace" string that substitutes for the matched string. The "replace" of a regex replace is the entire resulting string, and not some discrete "diff" element. Thanks for helping me understand that better.

      Perhaps what I am really looking to do (although it may be ill-conceived) is:

      • Capture the array of matched substrings based on the first modifier argument
      • loop through the array, performing a regex substitution on each substring based on the second argument.
      • return the list of modified substrings

      This algorithm is certainly not a standard "regex substitution". I'll have to think more about this to determine if it even makes sense to do. But at least you have helped me define the problem. Thanks!

        So now I'm thinking that the tag modifier needs three arguments:
        • The match regex expression to extract the substrings.
        • The substitution regex to perform on these captured substrings.
        • The replace expression.
        Is that as crazy as it sounds to me? :)

      Moritz thanks again for your insight, it helped me greatly. I pushed the RegexList Movable Type plugin to github.

      The plugin is a tag modifer, meaning you place it into any Movable Type tag that outputs text. The modifier has three arguments. One specifies the substrings to process, and the other two are the "search" regex expression and the replace expression. The result of course is an array of processed substrings which are passed directly within the parent tag to a Movable type array variable (via the built-in setvar modifier). The optional fourth argument to the regex_list modifier defines an alternate capture variable (1-9) to the first match expression (the default is the whole match ($&). I've pasted the main module code below.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://967949]
Approved by moritz
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2018-02-19 22:36 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (266 votes). Check out past polls.