Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

regex not matching special char

by mnooning (Sexton)
on Dec 13, 2012 at 15:35 UTC ( #1008675=perlquestion: print w/ replies, xml ) Need Help??
mnooning has asked for the wisdom of the Perl Monks concerning the following question:

In this narrowed down regex case, each line produces an ERR.

I cannot spot the error. Can you?

Just to be clear, the dollar sign could be one or more other special chars, not just a single dollar sign

my $line1 = 'I:\$AVG'; # Should NOT be okay. my $line2 = 'I:\$AVG\hello.log'; # Should NOT be okay. my $skip = '$AVG'; #------------------------------------------------ if ($line1 =~ m!\b\Q$skip\E\b!i) { print __LINE__." Line 1 Okay!\n"; } else { print __LINE__." Line 1 ERR! $line1 did not match $skip\n"; } if ($line2 =~ m!\b\Q$skip\E\b!i) { print __LINE__." Line 2 Okay!\n"; } else { print __LINE__." Line 2 ERR! $line2 did not match $skip\n"; } __END__ 14 Line 1 ERR! I:\$AVG did not match $AVG 19 Line 2 ERR! I:\$AVG\hello.log did not match $AVG

Comment on regex not matching special char
Download Code
Re: regex not matching special char
by kennethk (Monsignor) on Dec 13, 2012 at 15:56 UTC
    I think part of your issue is your understanding of what \b means. From perlre:
    A word boundary (\b ) is a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a \W .

    This means \$AVG will never match /\b\$AVG/ because there is no word boundary between a backslash (\W) and a dollar sign (\W).


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Interesting. A backslash is a word boundary, but a backslashed dollar sign is itself a non-word char itself, and hence is also a word boundary to the string "AVG" which follows it. Rats!

      I need to distinguish between strings such as "$AVG", A$AVG", "A$AVGA". Hence my attempt to do it using \b$AVG\b.

      Am I asking too much of Perl regex?

        What if you change \b to \w?

        In addition to looking at the documentation linked by kennethk (and also at perlretut; see in particular the section titled 'Looking ahead and looking behind'), perhaps some insight as to the effect of the  \b (and \B) zero-width word (and non-word) boundary assertions can be gained by split-ting one of the OPed example strings on each assertion:

        >perl -wMstrict -le "my $line2 = 'I:\$AVG\hello.log'; printf qq{'$_' } for split /\b/, $line2; print qq{\n}; printf qq{'$_' } for split /\B/, $line2; " 'I' ':\$' 'AVG' '\' 'hello' '.' 'log' 'I:' '\' '$A' 'V' 'G\h' 'e' 'l' 'l' 'o.l' 'o' 'g'

        Am I asking too much of Perl regex?

        ;) No, is Perl regex asking too much by asking you to know what you're asking for ? *zing*

        perlrequick

        Luckily YAPE::Regex::Explain handles these

        use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( '\b\$AVG' )->explain; __END__ The regular expression: (?-imsx:\b\$AVG) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- \$ '$' ---------------------------------------------------------------------- AVG 'AVG' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
        use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( '\w?\$AVG\b' )->explain; __END__The regular expression: (?-imsx:\w?\$AVG\b) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \w? word characters (a-z, A-Z, 0-9, _) (optional (matching the most amount possible)) ---------------------------------------------------------------------- \$ '$' ---------------------------------------------------------------------- AVG 'AVG' ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

      For completeness, the code below was the final solution. This solution circumvents the problematic nuance that kennethk explained.

      if ($line5 =~ m!(\\|\b)\Q$your_item\E(\\|\b)!i) { ... }

      Thanks again

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1008675]
Approved by Lotus1
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (9)
As of 2014-12-29 03:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (184 votes), past polls