http://www.perlmonks.org?node_id=950019

BenHopkins has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks. I found an anomaly, at least I think it is. Consider the following code:
#!/usr/bin/perl use strict; use warnings; my $str = '/path/to/f(il)e'; (my $name = $str) =~ s{.*/}{}; # basename (my $path = $str) =~ s{$name}{}; print " str = $str\n"; print "name = $name\n"; print "(unescaped) path = $path\n"; ($path = $str) =~ s{\Q$name\E}{}; print "(escaped) path = $path\n";

That produces the following output:

str = /path/to/f(il)e name = f(il)e (unescaped) path = /path/to/f(il)e (escaped) path = /path/to/

The first substitute (basename) worked. The 2nd, the one that was supposed to isolate the path part of the string does NOT work because there are parentheses in $name.

I worked around it with the \Q...\E trick, but I want to know why parens in the variable screwed up one substitute but not the other one

Replies are listed 'Best First'.
Re: Parens mess up regex substitute
by davido (Cardinal) on Jan 26, 2012 at 06:43 UTC

    perlop:

    PATTERN may contain variables, which will be interpolated every time the pattern search is evaluated, except for when the delimiter is a single quote.

    Parens aren't the only thing you have to look out for. '.' (dot) will be treated as a meta-character that matches anything except for newline. [] square brackets will be treated like character classes, and so on through the entire list of regexp pattern tokens. quotemeta or \Q and \E are not just quirks providing a lucky solution, this sort of situation is on the short list of reasons for quotemeta and \Q\E to exist.

    In perldoc -f quotemeta you'll read, "quotemeta (and \Q ... \E ) are useful when interpolating strings into regular expressions, because by default an interpolated variable will be considered a mini-regular expression. I wouldn't call it an anomaly, it's just how Perl is designed.

    perlop contains a good crash course on how interpolation works. The sections to look for start with Quote and Quote-like Operators, and continue through the end of "Gory details of parsing quoted constructs".


    Dave

Re: Parens mess up regex substitute
by ikegami (Patriarch) on Jan 26, 2012 at 04:19 UTC

    but I want to know why parens in the variable screwed up one substitute but not the other one

    The first argument of the substitution operator is {expected to be|treated as} a regex pattern.

    The regex pattern «f(il)e» (line 7) matches the string «file» and captures «il» in $1 or some such variable.

    The regex pattern «f\(il\)e» (line 11) matches the string «f(il)e».

Re: Parens mess up regex substitute
by Anonymous Monk on Jan 26, 2012 at 04:12 UTC

    Because in perl's regular expression language parens have meaning, they're meta characters, specifically, they're called capturing parens. Anything matched in capturing parens, is stored in a number variable ($1,$2 ...)

    see perlre, perlretut

    To see how the regex engine does matching,  use re 'debug';

Re: Parens mess up regex substitute
by Crackers2 (Parson) on Jan 26, 2012 at 16:14 UTC

    While all of the above answers are correct, I think they may be slightly missing the point of your question. What I understand you're asking is: "Why do the parens in $str do not screw up

    (my $name = $str) =~ s{.*/}{}; # basename

    but do screw up

    (my $path = $str) =~ s{$name}{};

    ?"

    And the answer to that is: they don't. It's not the parens in $str that are the problem in the second substitution, it's the parens in $name