Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

regex not matching special char

by mnooning (Sexton)
on Dec 13, 2012 at 15:35 UTC ( #1008675=perlquestion: print w/ replies, xml ) Need Help??
mnooning has asked for the wisdom of the Perl Monks concerning the following question:

In this narrowed down regex case, each line produces an ERR.

I cannot spot the error. Can you?

Just to be clear, the dollar sign could be one or more other special chars, not just a single dollar sign

my $line1 = 'I:\$AVG'; # Should NOT be okay. my $line2 = 'I:\$AVG\hello.log'; # Should NOT be okay. my $skip = '$AVG'; #------------------------------------------------ if ($line1 =~ m!\b\Q$skip\E\b!i) { print __LINE__." Line 1 Okay!\n"; } else { print __LINE__." Line 1 ERR! $line1 did not match $skip\n"; } if ($line2 =~ m!\b\Q$skip\E\b!i) { print __LINE__." Line 2 Okay!\n"; } else { print __LINE__." Line 2 ERR! $line2 did not match $skip\n"; } __END__ 14 Line 1 ERR! I:\$AVG did not match $AVG 19 Line 2 ERR! I:\$AVG\hello.log did not match $AVG

Comment on regex not matching special char
Download Code
Re: regex not matching special char
by kennethk (Monsignor) on Dec 13, 2012 at 15:56 UTC
    I think part of your issue is your understanding of what \b means. From perlre:
    A word boundary (\b ) is a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a \W .

    This means \$AVG will never match /\b\$AVG/ because there is no word boundary between a backslash (\W) and a dollar sign (\W).


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Interesting. A backslash is a word boundary, but a backslashed dollar sign is itself a non-word char itself, and hence is also a word boundary to the string "AVG" which follows it. Rats!

      I need to distinguish between strings such as "$AVG", A$AVG", "A$AVGA". Hence my attempt to do it using \b$AVG\b.

      Am I asking too much of Perl regex?

        What if you change \b to \w?

        In addition to looking at the documentation linked by kennethk (and also at perlretut; see in particular the section titled 'Looking ahead and looking behind'), perhaps some insight as to the effect of the  \b (and \B) zero-width word (and non-word) boundary assertions can be gained by split-ting one of the OPed example strings on each assertion:

        >perl -wMstrict -le "my $line2 = 'I:\$AVG\hello.log'; printf qq{'$_' } for split /\b/, $line2; print qq{\n}; printf qq{'$_' } for split /\B/, $line2; " 'I' ':\$' 'AVG' '\' 'hello' '.' 'log' 'I:' '\' '$A' 'V' 'G\h' 'e' 'l' 'l' 'o.l' 'o' 'g'

        Am I asking too much of Perl regex?

        ;) No, is Perl regex asking too much by asking you to know what you're asking for ? *zing*

        perlrequick

        Luckily YAPE::Regex::Explain handles these

        use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( '\b\$AVG' )->explain; __END__ The regular expression: (?-imsx:\b\$AVG) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- \$ '$' ---------------------------------------------------------------------- AVG 'AVG' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
        use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( '\w?\$AVG\b' )->explain; __END__The regular expression: (?-imsx:\w?\$AVG\b) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \w? word characters (a-z, A-Z, 0-9, _) (optional (matching the most amount possible)) ---------------------------------------------------------------------- \$ '$' ---------------------------------------------------------------------- AVG 'AVG' ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

      For completeness, the code below was the final solution. This solution circumvents the problematic nuance that kennethk explained.

      if ($line5 =~ m!(\\|\b)\Q$your_item\E(\\|\b)!i) { ... }

      Thanks again

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1008675]
Approved by Lotus1
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (18)
As of 2014-10-21 13:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (103 votes), past polls