http://www.perlmonks.org?node_id=1078356

JDoolin has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to write a subroutine which will do a global substitution on the global variable $_, then report some debugging information.
my ($nobrackets) = (qr/[^\{^\}]+/); &replace(qr/\{$nobrackets\}/,'$1') #This line is changed below. sub replace{ s/$_[1]/$_[2]/g }
The code does a nice job of finding the matches, but instead of removing the brackets, and keeping the text, it removes the brackets, and replaces them with the text with the literal '$1'.
my ($nobrackets) = (qr/[^\{^\}]+/); &replace(qr/\{$nobrackets\}/,"$1") #Using qq// instead of q//. sub replace{ s/$_[1]/$_[2]/g }
The first snippet of code will never replace the $1 with the captured text. The second snippet replaces the $1 before it captures the text. Is there a way to make perl interpolate the $1 after the call, and during the substitution?

=========================================================================== Update: 1:49 PM, March 15 (I have posted this clarification below, but I accidentally buried it in a subthread.)

Here is the code that I have working.

#!/usr/bin/perl $_='{\selectlanguage{english} \textcolor{black}{\ \ 10.\ \ Three resistors connected in series each carry currents labeled }\textit{\textcolor{black}{I}}\textcolor{black} +{\textsubscript{1}}\textcolor{black}{, }\textit{\textcolor{black}{I}}\textcolor{black}{\textsubscript{2}}\tex +tcolor{black}{and}\textit{\textcolor{black}{I}}\textcolor{black}{\tex +tsubscript{3}}\textcolor{black}{. Which of the following expresses the value of the total current }\textit{\textcolor{black}{I}}\textit{\textcolor{black}{\textsubscript +{T}}}\textcolor{black}{in the system made up of the three resistors i +n series?}}.';; $nobrackets = qr/[^\{}]+/; my $pass = 0; while(++$pass <=2){ s/\\textsuperscript\{($nobrackets)\}/ startsuperscript $1 endsuperscri +pt /g; s/\\textsubscript\{($nobrackets)\}/ startsubscript $1 endsubscript/g; s/\\textit\{($nobrackets)\}/ startitalic $1 enditalic/g; s/\\textcolor\{$nobrackets\}//g; s/\{($nobrackets)\}/($1)/g; print "Pass $pass:\n\n". qq{$_}."\n\n\n"; }
This produces output as follows:
Pass 1: {\selectlanguage(english) (\ \ 10.\ \ Three resistors connected in series each carry currents labeled )\textit{(I)}( startsubscript 1 endsubscript)(, )\textit{(I)}( startsubscript 2 endsubscript)(and)\textit{(I)}( starts +ubscript 3 endsubscript)(. Which of the following expresses the value of the total current )\textit{(I)}\textit{( startsubscript T endsubscript)}(in the system m +ade up of the three resistors in series?)}. Pass 2: (\selectlanguage(english) (\ \ 10.\ \ Three resistors connected in series each carry currents labeled ) startitalic (I) enditalic( startsubscript 1 e +ndsubscrip t)(, ) startitalic (I) enditalic( startsubscript 2 endsubscript)(and) start +italic (I) enditalic( startsubscript 3 endsubscript)(. Which of the following expresses the value of the total current ) startitalic (I) enditalic startitalic ( startsubscript T endsubscrip +t) endital ic(in the system made up of the three resistors in series?)).
Notice on pass 1, it removes the inner curly-brackets, and on pass 2, it removes the outer curly-brackets, additional passes could remove more curly-brackets if necessary. What I want(ed) to change was to turn these s///g or s///eeg statements into subroutines, keeping the capture and replacement variables separate. The code works fine as is, but I'm still curious as to whether the variables could be passed to a subroutine.
  • Comment on Regex - Is there any way to control when the contents of a variable are interpolated? (Using "$1" and '$1' in regex replacements)
  • Select or Download Code

Replies are listed 'Best First'.
Re: Regex - Is there any way to control when the contents of a variable are interpolated? (Using "$1" and '$1' in regex replacements)
by choroba (Cardinal) on Mar 14, 2014 at 16:16 UTC
    Several problems here:
    1. Array indexing starts at 0, not 1.
    2. In character classes (inside square brackets in a regex) you do not have to backslash brackets. ^ on the non-first position in a character class stands for itself.
    3. There are no capturing parentheses in the code, so $1 would be empty even if it worked.
    4. You have to evaluate the right hand side to turn the string $1 into the variable.

    #!/usr/bin/perl use warnings; use strict; sub replace { s/$_[0]/$_[1]/eeg; } $_ = 'a { b } c ( d ) e'; my $nobrackets = qr/[^{}]+/; replace(qr/\{($nobrackets)\}/, '$1'); print "$_\n";
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Hello, thank you for your prompt response. It appears that s///eeg gives me part of what I want. It seems to interpolate the $1 when it is alone. But it is an incomplete solution for me. If instead of just getting rid of the brackets I want to replace them with words:
      #!/usr/bin/perl use warnings; use strict; sub replace { s/$_[0]/$_[1]/eeg; } $_ = 'a { b } c ( d ) e'; my $nobrackets = qr/[^\{^\}]+/; replace(qr/\{($nobrackets)\}/, ' leftbracket $1 rightbracket '); # I w +ant to replace the brackets with the words leftbracket and rightbrack +et. print "$_\n";
      This appears to make the { b } vanish entirely.
Re: Regex - Is there any way to control when the contents of a variable are interpolated? (Using "$1" and '$1' in regex replacements)
by AnomalousMonk (Archbishop) on Mar 14, 2014 at 23:17 UTC

    I share some confusion about what you want, but perhaps this a closer approximation to it:

    c:\@Work\Perl>perl -wMstrict -le "my %xlate_map = qw/ ( l_paren ) r_paren { l_curly } r_curly /; my $xlate_targets = join '', map quotemeta, keys %xlate_map; ;; sub xlate { my ($s, $targets, $hr_map) = @_; ;; (my $t = $s) =~ s{ ([$targets]) }{$hr_map->{$1}}xmsg; return $t; } ;; my $str = 'a { b } c ( d ) e'; my $xlt = xlate($str, $xlate_targets, \%xlate_map); print qq{'$xlt'}; " 'a l_curly b r_curly c l_paren d r_paren e'

    Update 1: This can be generalized (and slowed down) a bit, and also, if you have Perl version 5.14+, simplified (and perhaps un-slowed) a bit (note use of  /r regex modifier in  s///r substitution):

    c:\@Work\Perl>perl -wMstrict -le "use 5.014; ;; my %xlate_map = qw/ ( l_paren ) r_paren { l_curly } r_curly /; ;; sub xlate { my ($s, $hr_map) = @_; ;; my $targets = join '', map quotemeta, keys %$hr_map; return $s =~ s{ ([$targets]) }{$hr_map->{$1}}xmsgr; } ;; my $str = 'a { b } c ( d ) e {{F}} ((G))'; my $xlt = xlate($str, \%xlate_map); print qq{'$xlt'}; " 'a l_curly b r_curly c l_paren d r_paren e l_curlyl_curlyFr_curlyr_cur +ly l_parenl_parenGr_parenr_paren'

    Update 2: Another approach: complete encapsulation; works with any version; perhaps a bit faster (note  /o regex modifier in  s///o substitution):

    c:\@Work\Perl>perl -wMstrict -le "print qq{Perl version $] }; ;; BEGIN { my %xlate_map = qw/ ( l_paren ) r_paren { l_curly } r_curly /; my $xlate_targets = join '', map quotemeta, keys %xlate_map; ;; sub xlate { my ($s) = @_; ;; (my $t = $s) =~ s{ ([$xlate_targets]) }{$xlate_map{$1}}xmsog; return $t; } } ;; my $str = 'a { b } c ( d ) e {{F}} ((G))'; my $xlt = xlate($str); print qq{'$xlt'}; " Perl version 5.008009 'a l_curly b r_curly c l_paren d r_paren e l_curlyl_curlyFr_curlyr_cur +ly l_parenl_parenGr_parenr_paren'

    Any of these approaches can be further simplified to operate directly upon  $_ if that's what you really need.

      Desired behavior:

      'a { b } c ( d ) e {{F}} ((G))'; should change to

      'a lbracket b rbracket c ( d ) e { lbracket F rbracket } ((G))';

      Using something like print s/\{([^\}^\}]+)\}/ lbracket $1 rbracket /g . " pairs of brackets replaced with lbracket rbracket.\n"

      works, but when I got the idea in my head that I wanted a subroutine to handle this for me (for legibility and debugging), I couldn't let it go.

        It seems you have your solution. However, I see you're still including the  '^' character in your inverted character class; is this what you want? See example below.

        c:\@Work\Perl>perl -wMstrict -le "$_ = 'a { b } c { d^ } e {{F}} {{^G}}'; ;; s/\{([^{^}]+)\}/ lbracket $1 rbracket /g; print qq{'$_'}; " 'a lbracket b rbracket c { d^ } e { lbracket F rbracket } {{^G}}'
Re: Regex - Is there any way to control when the contents of a variable are interpolated? (Using "$1" and '$1' in regex replacements)
by Laurent_R (Canon) on Mar 14, 2014 at 22:10 UTC

    Hi, I am not sure that I understand what you want to do, but I am sure that I really don't understand why you are making things so complicated. Please explain what you want to do, or what result you want to obtain.

    May be you want to try simply this:

    !/usr/bin/perl use warnings; use strict; $_ = "a { b } c ( d ) e"; my $nobrackets = qr/([^{}]+)/; s/\{$nobrackets\}/leftbracket $1 rightbracket/g; print "$_\n";
    which prints:
    a leftbracket b rightbracket c ( d ) e
    which is presumably what you want. Is this right? Or do you have something else in mind?

    The would be easier ways to obtain the same result (such as changing the curlies only, rather than a combination of curlies and text), but I tried to stay close to what you have.

    Update : Based on your update on Mar 15, 2014 at 05:33 UTC, I now understand that you had this complicated way of doing things simply because you wanted to do the replacement in a subroutine. You might modify the above in the following way:

    use warnings; use strict; my @c = ("a { b } c ( d ) e", "a { b } c ( d ) e {{F}} ((G))"); print replace($_), "\n" for @c; sub replace { my $d = shift; my $nobrackets = qr/([^{}]+)/; $d =~ s/\{$nobrackets\}/leftbracket $1 rightbracket/g; return $d; }
    which gives the following output:
    a leftbracket b rightbracket c ( d ) e a leftbracket b rightbracket c ( d ) e {leftbracket F rightbracket} +((G))

Re: Regex - Is there any way to control when the contents of a variable are interpolated? (Using "$1" and '$1' in regex replacements)
by kcott (Archbishop) on Mar 15, 2014 at 08:22 UTC

    G'day JDoolin,

    Welcome to the monastery.

    Given your two examples in subsequent posts in this thread, this does what you want:

    #!/usr/bin/env perl -l use strict; use warnings; my @test_strings = ( 'a { b } c ( d ) e', 'a { b } c ( d ) e {{F}} ((G))' ); my %replacement_for = ( '{' => ' lbracket ', ' { ' => ' lbracket ', '}' => ' rbracket ', ' } ' => ' rbracket ', ); for (@test_strings) { print "Initial string: '$_'"; replace(); print "Replaced string: '$_'"; } sub replace { s/( { | } |{(?!{)|(?<!})})/$replacement_for{$1}/g; }

    Output:

    Initial string: 'a { b } c ( d ) e' Replaced string: 'a lbracket b rbracket c ( d ) e' Initial string: 'a { b } c ( d ) e {{F}} ((G))' Replaced string: 'a lbracket b rbracket c ( d ) e { lbracket F rbracke +t } ((G))'

    Two things to note:

    • You wrote "leftbracket" and "rightbracket" in one place, then "lbracket" and "rbracket" in another. Adjust the text in the code to suit.
    • Multiple spaces in HTML paragraph text (<p>...</p>) are squeezed into a single space. If you show your data within <code>...</code> tags, we'll be able to see exactly what you want. Again, adjust the text in the code to suit.

    -- Ken

Re: Regex - Is there any way to control when the contents of a variable are interpolated? (Using "$1" and '$1' in regex replacements)
by AnomalousMonk (Archbishop) on Mar 16, 2014 at 18:12 UTC

    To answer your original question, here's a couple of approaches that encapsulate markup processing in a function and fully parameterize it: number of passes, the string to be processed, and the tags to be processed in the order of processing are all passed. (BTW: I've never seen this type of markup before; can you provide any info on it?) If the strings to be processed are lengthy, passing them by reference rather than by value might speed things up, and these functions could easily be modified to pass the strings by reference. The two different methods of macro expansion and replacement probably run at different speeds, but I've done no Benchmark-ing and I doubt the difference is great. My guess is the method that uses no  /e evaluation in the  s/// is faster. There are no new features in the code; it can run under Perl 5.8.9.

    Note: The  SML.pm file is not a 'true' module to my way of thinking: it exports nothing, has no OO stuff. It's just a convenient wrapper for a couple of functions.

    File SML.pm:

    File SML.t:

Re: Regex - Is there any way to control when the contents of a variable are interpolated? (Using "$1" and '$1' in regex replacements)
by AnomalousMonk (Archbishop) on Mar 17, 2014 at 20:45 UTC

    Here's another approach I thought of: an iterator generator. It's a bit simpler, maybe a bit faster because it does not pass/return strings by value, but by reference. Also runs under 5.8.9. Again, this is not a 'true' module as it exports nothing. You could easily fix this if you want, but the module is simple enough to operate as-is IMHO.

    File SML2.pm:

    File SML2.t: