Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Parens mess up regex substitute

by BenHopkins (Sexton)
on Jan 26, 2012 at 04:04 UTC ( #950019=perlquestion: print w/ replies, xml ) Need Help??
BenHopkins has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks. I found an anomaly, at least I think it is. Consider the following code:
#!/usr/bin/perl use strict; use warnings; my $str = '/path/to/f(il)e'; (my $name = $str) =~ s{.*/}{}; # basename (my $path = $str) =~ s{$name}{}; print " str = $str\n"; print "name = $name\n"; print "(unescaped) path = $path\n"; ($path = $str) =~ s{\Q$name\E}{}; print "(escaped) path = $path\n";

That produces the following output:

str = /path/to/f(il)e name = f(il)e (unescaped) path = /path/to/f(il)e (escaped) path = /path/to/

The first substitute (basename) worked. The 2nd, the one that was supposed to isolate the path part of the string does NOT work because there are parentheses in $name.

I worked around it with the \Q...\E trick, but I want to know why parens in the variable screwed up one substitute but not the other one

Comment on Parens mess up regex substitute
Select or Download Code
Re: Parens mess up regex substitute
by Anonymous Monk on Jan 26, 2012 at 04:12 UTC

    Because in perl's regular expression language parens have meaning, they're meta characters, specifically, they're called capturing parens. Anything matched in capturing parens, is stored in a number variable ($1,$2 ...)

    see perlre, perlretut

    To see how the regex engine does matching,  use re 'debug';

Re: Parens mess up regex substitute
by ikegami (Pope) on Jan 26, 2012 at 04:19 UTC

    but I want to know why parens in the variable screwed up one substitute but not the other one

    The first argument of the substitution operator is {expected to be|treated as} a regex pattern.

    The regex pattern «f(il)e» (line 7) matches the string «file» and captures «il» in $1 or some such variable.

    The regex pattern «f\(il\)e» (line 11) matches the string «f(il)e».

Re: Parens mess up regex substitute
by davido (Archbishop) on Jan 26, 2012 at 06:43 UTC


    PATTERN may contain variables, which will be interpolated every time the pattern search is evaluated, except for when the delimiter is a single quote.

    Parens aren't the only thing you have to look out for. '.' (dot) will be treated as a meta-character that matches anything except for newline. [] square brackets will be treated like character classes, and so on through the entire list of regexp pattern tokens. quotemeta or \Q and \E are not just quirks providing a lucky solution, this sort of situation is on the short list of reasons for quotemeta and \Q\E to exist.

    In perldoc -f quotemeta you'll read, "quotemeta (and \Q ... \E ) are useful when interpolating strings into regular expressions, because by default an interpolated variable will be considered a mini-regular expression. I wouldn't call it an anomaly, it's just how Perl is designed.

    perlop contains a good crash course on how interpolation works. The sections to look for start with Quote and Quote-like Operators, and continue through the end of "Gory details of parsing quoted constructs".


Re: Parens mess up regex substitute
by Crackers2 (Vicar) on Jan 26, 2012 at 16:14 UTC

    While all of the above answers are correct, I think they may be slightly missing the point of your question. What I understand you're asking is: "Why do the parens in $str do not screw up

    (my $name = $str) =~ s{.*/}{}; # basename

    but do screw up

    (my $path = $str) =~ s{$name}{};


    And the answer to that is: they don't. It's not the parens in $str that are the problem in the second substitution, it's the parens in $name

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://950019]
Approved by lidden
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2014-07-30 22:09 GMT
Find Nodes?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:

    Results (241 votes), past polls