Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

about word boundary in RE

by anaconda_wly (Scribe)
on Apr 02, 2013 at 07:17 UTC ( [id://1026602]=perlquestion: print w/replies, xml ) Need Help??

anaconda_wly has asked for the wisdom of the Perl Monks concerning the following question:

my $test = "asd asd"; if ($test =~ /(.+\b)\1/) { print "Found $1 repeated\n"; }

why no output? Aren't the two "asd"s both matched ones?

Replies are listed 'Best First'.
Re: about word boundary in RE
by hdb (Monsignor) on Apr 02, 2013 at 07:24 UTC

    The whitespace between the words prevents the match.

    my $test = "asd asd"; if ($test =~ /(.+\b)\s\1/) { print "Found $1 repeated\n"; }
      Isn't there a word boundary before the whitespace? If (.+\b) already match, why I need the \s? I thought the output will be the first "asd" but not.
Re: about word boundary in RE (use re 'debug')
by Anonymous Monk on Apr 02, 2013 at 07:36 UTC
    Add  use re 'debug'; to that short file, and watch the regex engine do its thing in your console
      Good but seems not easily readable to me. If (.+\b) already match, why I need the \s?

        \b is a zero width match. It does need the space to recognize a word boundary, but it does not consume it. And therefore you need to add a space to your pattern.

        Good but seems not easily readable to me.

        In that case, use a shorter string, associate the numbers from "Final program" against those on the right side , like 1: OPEN1 (3)

        $ perl -Mre=debug -le " q/a a/ =~ /(.\b)\1/ " Compiling REx "(.\b)\1" Final program: 1: OPEN1 (3) 3: REG_ANY (4) 4: BOUND (5) 5: CLOSE1 (7) 7: REF1 (9) 9: END (0) minlen 1 Matching REx "(.\b)\1" against "a a" 0 <> <a a> | 1:OPEN1(3) 0 <> <a a> | 3:REG_ANY(4) 1 <a> < a> | 4:BOUND(5) 1 <a> < a> | 5:CLOSE1(7) 1 <a> < a> | 7:REF1(9) failed... 1 <a> < a> | 1:OPEN1(3) 1 <a> < a> | 3:REG_ANY(4) 2 <a > <a> | 4:BOUND(5) 2 <a > <a> | 5:CLOSE1(7) 2 <a > <a> | 7:REF1(9) failed... 2 <a > <a> | 1:OPEN1(3) 2 <a > <a> | 3:REG_ANY(4) 3 <a a> <> | 4:BOUND(5) 3 <a a> <> | 5:CLOSE1(7) 3 <a a> <> | 7:REF1(9) failed... 3 <a a> <> | 1:OPEN1(3) 3 <a a> <> | 3:REG_ANY(4) failed... Match failed Freeing REx: "(.\b)\1"

        Compare against a simpler pattern like

        $ perl -Mre=debug -le " q/aa/ =~ /a\b/ " Compiling REx "a\b" Final program: 1: EXACT <a> (3) 3: BOUND (4) 4: END (0) anchored "a" at 0 (checking anchored) minlen 1 Guessing start of match in sv for REx "a\b" against "aa" Found anchored substr "a" at offset 0... Guessed: match at offset 0 Matching REx "a\b" against "aa" 0 <> <aa> | 1:EXACT <a>(3) 1 <a> <a> | 3:BOUND(4) failed... 1 <a> <a> | 1:EXACT <a>(3) 2 <aa> <> | 3:BOUND(4) 2 <aa> <> | 4:END(0) Match successful! Freeing REx: "a\b"

        Then check the definition of \b in perlre#Assertions, perlrequick

        Perl defines the following zero-width assertions: The word anchor \b matches a boundary between a word character and a non-word character \w\W or \W\w
        $x = "Housecat catenates house and cat"; $x =~ /\bcat/; # matches cat in 'catenates' $x =~ /cat\b/; # matches cat in 'housecat' $x =~ /\bcat\b/; # matches 'cat' at end of string

        Basically your pattern can never match, just like this perl -Mre=debug -le " q/aa/ =~ /a\ba/ "

        there can never be a word boundary within a word by definition

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1026602]
Approved by 2teez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-03-29 01:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found