http://www.perlmonks.org?node_id=1008750


in reply to Re: regex not matching special char
in thread regex not matching special char

Interesting. A backslash is a word boundary, but a backslashed dollar sign is itself a non-word char itself, and hence is also a word boundary to the string "AVG" which follows it. Rats!

I need to distinguish between strings such as "$AVG", A$AVG", "A$AVGA". Hence my attempt to do it using \b$AVG\b.

Am I asking too much of Perl regex?

Replies are listed 'Best First'.
Re^3: regex not matching special char
by AnomalousMonk (Archbishop) on Dec 14, 2012 at 05:43 UTC

    In addition to looking at the documentation linked by kennethk (and also at perlretut; see in particular the section titled 'Looking ahead and looking behind'), perhaps some insight as to the effect of the  \b (and \B) zero-width word (and non-word) boundary assertions can be gained by split-ting one of the OPed example strings on each assertion:

    >perl -wMstrict -le "my $line2 = 'I:\$AVG\hello.log'; printf qq{'$_' } for split /\b/, $line2; print qq{\n}; printf qq{'$_' } for split /\B/, $line2; " 'I' ':\$' 'AVG' '\' 'hello' '.' 'log' 'I:' '\' '$A' 'V' 'G\h' 'e' 'l' 'l' 'o.l' 'o' 'g'
Re^3: regex not matching special char
by Anonymous Monk on Dec 14, 2012 at 08:50 UTC

    Am I asking too much of Perl regex?

    ;) No, is Perl regex asking too much by asking you to know what you're asking for ? *zing*

    perlrequick

    Luckily YAPE::Regex::Explain handles these

    use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( '\b\$AVG' )->explain; __END__ The regular expression: (?-imsx:\b\$AVG) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- \$ '$' ---------------------------------------------------------------------- AVG 'AVG' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( '\w?\$AVG\b' )->explain; __END__The regular expression: (?-imsx:\w?\$AVG\b) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \w? word characters (a-z, A-Z, 0-9, _) (optional (matching the most amount possible)) ---------------------------------------------------------------------- \$ '$' ---------------------------------------------------------------------- AVG 'AVG' ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

      Thanks for the tip on YAPE::Regex::Explain.

      As kennethk pointed out - reading between the lines - the key is to know that a back slashed special character will not be taken as a word character, and hence will be seen as a word boundary itself. That is why the simple \b does not work.

      Armed with a definitive answer as to what is happening, a simple work around can be constructed. In my case the strings I am looking for are actual directory names. I can split on the directory separator.

      Thanks again.
Re^3: regex not matching special char
by muba (Priest) on Dec 14, 2012 at 04:24 UTC

    What if you change \b to \w?