perl regex referencing

by ocs (Monk)
on Sep 18, 2007 at 08:10 UTC

ocs has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I got a small problem concerning references in perls regex flavor.
For example, I got a string like this

$lala = "Hello you therre."

And I want to match all double characters (a-z), i.e. ll and rr in the given string and contract them to one char. (Please do not take this seriously, its just an example.)

So I went like this:

$lala =~ s/([a-z])$1/$1/g

It does not work. So I played around:

$lala =~ s/([a-z])\1/$1/g

This works. But what I kept in mind was this: Warning on \1 vs $1

So I don't know why in the first version the $1 in the matching part (not the substitute part) does not work but with \1 in the second version? I thought \1 is obsolete and just a relic of a sed styled referencing.

This is nothing big, but ... did I miss something?

Thanks in advance,


tennis players have fuzzy balls.

Replies are listed 'Best First'.
Re: perl regex referencing
by zshzn (Hermit) on Sep 18, 2007 at 08:26 UTC
    perlre explains this as
    The bracketing construct ( ... ) creates capture buffers. To refer to +the digit'th buffer use \<digit> within the match. Outside the match +use "$" instead of "\".
Re: perl regex referencing
by Prof Vince (Friar) on Sep 18, 2007 at 08:22 UTC
    It's not a problem to use \1 and friends in the LHS of the substitution because it doesn't behave like a quoted string : the <backslash-digit> token has a well defined meaning there. Moreover, $1 can already be a capture in a previous successful regexp, so it shouldn't be reset before the end of the matching part of the substitution.
Re: perl regex referencing
by bruceb3 (Pilgrim) on Sep 18, 2007 at 08:18 UTC
    The back slash is used inside the match and the dollar sign is used outside of the match. In this case "inside the match" refers to (a-z)\1 because the text is being matched against this regex. The dollar 1 is not being matched against. It's part of the substitution.
Re: perl regex referencing
by ikegami (Patriarch) on Sep 18, 2007 at 14:38 UTC

    Some alternatives:

    ≥ 5.10

    $lala =~ s/([a-z])\K\1//g;

    See the perlre for 5.10 (or the one for 5.9.5).

    < 5.10

    use Regexp::Keep; $lala =~ s/([a-z])\K\1//g;

    See Regexp::Keep.

    Update: Added links to documentation as privately requested.

Re: perl regex referencing
by eff_i_g (Curate) on Sep 18, 2007 at 14:16 UTC

