Re: Re: Re: quick question about parenthesis and regular expressions

by jreades (Friar)
on Nov 04, 2003 at 13:32 UTC

in reply to Re: Re: quick question about parenthesis and regular expressions
in thread quick question about parenthesis and regular expressions

Ummm, I have no problem doing the substitution using the code supplied:

$line = "She was very absorbed in her homework."; $word = "absorb"; print STDOUT "Line: $line\n"; $line =~ s/($word(?:s|ed))/<b>$1<\/b>/igm; print STDOUT "New Line: $line\n"; print STDOUT "\$1 contains: $1\n";

Of course, I might consider changing it slightly to read:

$line =~ s/\b(${word}(?:s|ed)?)\b/<b>$1<\/b>/igm;

However, I did just discover that there is something more going on here, because while the original (without the \b) does set something in $1, my change doesn't (despite doing the substitution properly).

And finally, if you're just doing a match why are you using s///?

Replies are listed 'Best First'.
Re... quick question about parentheses and regular expressions
on Nov 04, 2003 at 18:08 UTC
    You're right: the \bs in the pattern do seem to sabotage $1. In particular, any char in front of the () group kills the $1, if the g and i switches are both set.
    my $line = 'She was very absorbed in her homework.'; my $word = 'absorb'; $_=$line; s/ ($word)//g; print "\$1 is $1\n"; $_=$line; s/ ($word)//i; print "\$1 is $1\n"; $_=$line; s/ ($word)//ig; print "\$1 is $1\n";
    One other note:
    #This also fails s/(\b$word)//ig; #Although this is ok s/( $word)//ig; #And this is fine, too! my $pat = qr/\b($word)/; s/$pat//gi; #or even my $pat = qr/($word)/; s/\b$pat//gi; #or EVEN THIS! my $pat = qr/$word/; s/\b($word)//gi;
    I think we have a perlbug. Frenzy of updates completed. Really.
Re: Re: Re: Re: quick question about parenthesis and regular expressions
on Nov 04, 2003 at 18:49 UTC
    It looks like the $1 is being lost because of the g modifier. That is, it matches, it replaces, it tries again. It starts the matching process again, finds the start of a possible match at the word "in", wipes out $1 etc, finds that the possible match isn't really, and then when the substitution loop finishes, you have lost $1.

    This appears to be a bug. (Report with perlbug if you like.) However I would also point out that any code which relies on the correct behaviour is likely to be buggy anyways - if the word appears multiple times then you won't catch all of the substitutions. If you really want to have fine access to all of the substitution information after the fact then you either need to write your own substitution loop (using matching with /g, pos and substr) or you need to embed code in the substitution. Like this:

    my @matches; $line =~ s/\b($word(?:s|ed))/ push @matches, $1; "<b>$1<\/b>" /iegm;

