Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Re: Re: quick question about parenthesis and regular expressions

by jreades (Friar)
on Nov 04, 2003 at 13:32 UTC ( #304407=note: print w/ replies, xml ) Need Help??


in reply to Re: Re: quick question about parenthesis and regular expressions
in thread quick question about parenthesis and regular expressions

Ummm, I have no problem doing the substitution using the code supplied:

$line = "She was very absorbed in her homework."; $word = "absorb"; print STDOUT "Line: $line\n"; $line =~ s/($word(?:s|ed))/<b>$1<\/b>/igm; print STDOUT "New Line: $line\n"; print STDOUT "\$1 contains: $1\n";

Of course, I might consider changing it slightly to read:

$line =~ s/\b(${word}(?:s|ed)?)\b/<b>$1<\/b>/igm;

However, I did just discover that there is something more going on here, because while the original (without the \b) does set something in $1, my change doesn't (despite doing the substitution properly).

And finally, if you're just doing a match why are you using s///?


Comment on Re: Re: Re: quick question about parenthesis and regular expressions
Select or Download Code
Re... quick question about parentheses and regular expressions
by Roy Johnson (Monsignor) on Nov 04, 2003 at 18:08 UTC
    You're right: the \bs in the pattern do seem to sabotage $1. In particular, any char in front of the () group kills the $1, if the g and i switches are both set.
    my $line = 'She was very absorbed in her homework.'; my $word = 'absorb'; $_=$line; s/ ($word)//g; print "\$1 is $1\n"; $_=$line; s/ ($word)//i; print "\$1 is $1\n"; $_=$line; s/ ($word)//ig; print "\$1 is $1\n";
    One other note:
    #This also fails s/(\b$word)//ig; #Although this is ok s/( $word)//ig; #And this is fine, too! my $pat = qr/\b($word)/; s/$pat//gi; #or even my $pat = qr/($word)/; s/\b$pat//gi; #or EVEN THIS! my $pat = qr/$word/; s/\b($word)//gi;
    I think we have a perlbug. Frenzy of updates completed. Really.
Re: Re: Re: Re: quick question about parenthesis and regular expressions
by tilly (Archbishop) on Nov 04, 2003 at 18:49 UTC
    It looks like the $1 is being lost because of the g modifier. That is, it matches, it replaces, it tries again. It starts the matching process again, finds the start of a possible match at the word "in", wipes out $1 etc, finds that the possible match isn't really, and then when the substitution loop finishes, you have lost $1.

    This appears to be a bug. (Report with perlbug if you like.) However I would also point out that any code which relies on the correct behaviour is likely to be buggy anyways - if the word appears multiple times then you won't catch all of the substitutions. If you really want to have fine access to all of the substitution information after the fact then you either need to write your own substitution loop (using matching with /g, pos and substr) or you need to embed code in the substitution. Like this:

    my @matches; $line =~ s/\b($word(?:s|ed))/ push @matches, $1; "<b>$1<\/b>" /iegm;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://304407]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (16)
As of 2014-07-23 20:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (152 votes), past polls