Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
Think about Loose Coupling
 
PerlMonks  

Re: regex not matching special char

by kennethk (Monsignor)
on Dec 13, 2012 at 15:56 UTC ( #1008679=note: print w/ replies, xml ) Need Help??


in reply to regex not matching special char

I think part of your issue is your understanding of what \b means. From perlre:

A word boundary (\b ) is a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a \W .

This means \$AVG will never match /\b\$AVG/ because there is no word boundary between a backslash (\W) and a dollar sign (\W).


#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.


Comment on Re: regex not matching special char
Select or Download Code
Re^2: regex not matching special char
by mnooning (Sexton) on Dec 14, 2012 at 00:48 UTC

    Interesting. A backslash is a word boundary, but a backslashed dollar sign is itself a non-word char itself, and hence is also a word boundary to the string "AVG" which follows it. Rats!

    I need to distinguish between strings such as "$AVG", A$AVG", "A$AVGA". Hence my attempt to do it using \b$AVG\b.

    Am I asking too much of Perl regex?

      What if you change \b to \w?

      In addition to looking at the documentation linked by kennethk (and also at perlretut; see in particular the section titled 'Looking ahead and looking behind'), perhaps some insight as to the effect of the  \b (and \B) zero-width word (and non-word) boundary assertions can be gained by split-ting one of the OPed example strings on each assertion:

      >perl -wMstrict -le "my $line2 = 'I:\$AVG\hello.log'; printf qq{'$_' } for split /\b/, $line2; print qq{\n}; printf qq{'$_' } for split /\B/, $line2; " 'I' ':\$' 'AVG' '\' 'hello' '.' 'log' 'I:' '\' '$A' 'V' 'G\h' 'e' 'l' 'l' 'o.l' 'o' 'g'

      Am I asking too much of Perl regex?

      ;) No, is Perl regex asking too much by asking you to know what you're asking for ? *zing*

      perlrequick

      Luckily YAPE::Regex::Explain handles these

      use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( '\b\$AVG' )->explain; __END__ The regular expression: (?-imsx:\b\$AVG) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- \$ '$' ---------------------------------------------------------------------- AVG 'AVG' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
      use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( '\w?\$AVG\b' )->explain; __END__The regular expression: (?-imsx:\w?\$AVG\b) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \w? word characters (a-z, A-Z, 0-9, _) (optional (matching the most amount possible)) ---------------------------------------------------------------------- \$ '$' ---------------------------------------------------------------------- AVG 'AVG' ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

        Thanks for the tip on YAPE::Regex::Explain.

        As kennethk pointed out - reading between the lines - the key is to know that a back slashed special character will not be taken as a word character, and hence will be seen as a word boundary itself. That is why the simple \b does not work.

        Armed with a definitive answer as to what is happening, a simple work around can be constructed. In my case the strings I am looking for are actual directory names. I can split on the directory separator.

        Thanks again.
Re^2: regex not matching special char
by mnooning (Sexton) on Dec 14, 2012 at 13:36 UTC

    For completeness, the code below was the final solution. This solution circumvents the problematic nuance that kennethk explained.

    if ($line5 =~ m!(\\|\b)\Q$your_item\E(\\|\b)!i) { ... }

    Thanks again

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1008679]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (9)
As of 2014-04-20 22:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (489 votes), past polls