http://www.perlmonks.org?node_id=783236


in reply to Regex help

E:\Temp>perl -Mstrict -we "$_=7;die qq(matched '$&'\n) if '1234567_.'= +~/[$_]/" matched '7'

Seems like any punctation variables are interpolated inside regex character classes.

This is a surprise (for me at least; and to ww above too it seems)

Is that documented behaviour? Where?


Update: Fixed attribution of surprise, naming the wrong person (graff) when citing a post of ww.

Replies are listed 'Best First'.
Re^2: Regex help
by graff (Chancellor) on Jul 26, 2009 at 02:50 UTC
    Is that documented behaviour? Where?

    Yes, in perlre, as follows:

    An unescaped "$" or "@" interpolates the corresponding variable, while escaping will cause the literal string "\$" to be matched.

    (Though in the version of the perlre man page I have installed, for perl 5.8.8, this sentence comes second in a paragraph that begins with:

    You cannot include a literal "$" or "@" within a "\Q" sequence."

    I can understand that some might consider this obscure.)

      Those remarks in perlre are not specific to character classes, and one regularly thinks these character classes are more special.

      Explicit mentioning of $ being special in character classes is found in perlretut#Using-character-classes:

      …The special characters for a character class are -]\^$ (and the pattern delimiter, whatever it is). ] is special because it denotes the end of a character class. $ is special because it denotes a scalar variable.…

      So indead not only punctation variables are being expanded:

      E:\Temp>perl -Mstrict -we "my $foo=7;die qq(matched '$&'\n) if '123456 +7rab_.'=~/[${foo}bar]+/" matched '7rab'
Re^2: Regex help
by Anonymous Monk on Jul 26, 2009 at 03:05 UTC
    perlre also says Because patterns are processed as double quoted strings, the following also work:
    \t tab (HT, TAB) \n newline (LF, NL) \r return (CR) \f form feed (FF) \a alarm (bell) (BEL) \e escape (think troff) (ESC) \033 octal char (example: ESC) \x1B hex char (example: ESC) \x{263a} long hex char (example: Unicode SMILEY) \cK control char (example: VT) \N{name} named Unicode character \l lowercase next char (think vi) \u uppercase next char (think vi) \L lowercase till \E (think vi) \U uppercase till \E (think vi) \E end case modification (think vi) \Q quote (disable) pattern metacharacters till \E
    So for no interpolation, you can use qr'', m'', s'''
    my $f = 2; print qr/$f/,"\n"; # (?-xism:2) print qr'$f',"\n"; # (?-xism:$f)