Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Regex word boundary and escaped characters

by ryantate (Friar)
on Jan 26, 2005 at 23:04 UTC ( [id://425413]=perlquestion: print w/replies, xml ) Need Help??

ryantate has asked for the wisdom of the Perl Monks concerning the following question:

The regex word boundary meta character, \b, does not seem to match start of string if the first word character of the string is escaped. Is this expected behavior?

Here is an example of the behavior in question:

apocalypse.OCF [178] perl5.8.5 -e 'my $s = "\@testing"; $s =~ s/\b\@testing\b//i; print "$s\n";' @testing apocalypse.OCF [179] perl5.8.5 -e 'my $s = "+testing"; $s =~ s/\b\+testing\b//i; print "$s\n";' +testing apocalypse.OCF [180] perl5.8.5 -e 'my $s = "testing"; $s =~ s/\btesting\b//i; print "$s\n";' apocalypse.OCF [181]

For now I'm using (\b|^)\@testing(\b|$) as a workaround, but this behavior seems at odds with perlre, which states:

A word boundary ( \b ) is defined as a spot between two characters that has a \w on one side of it and and a \W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a \W .

Is this a bug?

Update: No, of course, it's not a bug. @ and + being non word chars, there is no \w char to match. Duh. ;--> Thanks for the many quick responses.

Replies are listed 'Best First'.
Re: Regex word boundary and escaped characters (as designed)
by tye (Sage) on Jan 26, 2005 at 23:12 UTC

    \b only matches the start of the string if the first character is a \w character because it "[counts] the imaginary characters off the beginning and end of the string as matching a \W". That is, "+testing" is treated as if it were the middle of "*+testing*" (using "*" as the "imaginary characters off the beginning and end of the string") and there is no \b between the "*" and the "+".

    - tye        

Re: Regex word boundary and escaped characters
by Paladin (Vicar) on Jan 26, 2005 at 23:13 UTC
    No, it's doing exactly as it states. @ matches \W and the start of the string also "matches" \W, so there is no \b between them. Same goes for the string starting with +. In the third example, t matches \w so the \b matches the space between the start of the string (\W) and the t (\w)
Re: Regex word boundary and escaped characters
by dave_the_m (Monsignor) on Jan 26, 2005 at 23:14 UTC
    counting the imaginary characters off the beginning and end of the string as matching a \W
    You've got a \W (start of string) directly followed by a another \W (@ or +) so of course \b isn't going to match there.

    Dave.

Re: Regex word boundary and escaped characters
by Roy Johnson (Monsignor) on Jan 27, 2005 at 03:18 UTC
    This is a case where you'd roll your own lookahead and lookbehind, rather than using the ready-made \b. Perhaps @testing that is not preceded or followed by non-space:
    s/(?<!\S)\@testing(?!\S)//i;

    Caution: Contents may have been coded under pressure.
Re: Regex word boundary and escaped characters
by Eimi Metamorphoumai (Deacon) on Jan 27, 2005 at 14:19 UTC
    Others have answered your question, but I'd just like to point out that if you put a \b next to a non-word character, you're likely to get the exact opposite of what you want. That is, it will only match if the character next to it is a word character. So another way to fix it would be to use \B to assert that the character before you is also a non-word character.
    #!/usr/bin/perl -l use strict; use warnings; while(<DATA>){ chomp; print; print /\b\@testing\b/ ? "Matched \\b" : "Did not match \\b"; print /\B\@testing\b/ ? "Matched \\B" : "Did not match \\B"; print ""; } __DATA__ @testing foo@testing foo @testing

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://425413]
Approved by holli
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2024-04-24 02:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found