Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

about word boundary in RE

by anaconda_wly (Scribe)
on Apr 02, 2013 at 07:17 UTC ( #1026602=perlquestion: print w/ replies, xml ) Need Help??
anaconda_wly has asked for the wisdom of the Perl Monks concerning the following question:

my $test = "asd asd"; if ($test =~ /(.+\b)\1/) { print "Found $1 repeated\n"; }

why no output? Aren't the two "asd"s both matched ones?

Comment on about word boundary in RE
Download Code
Re: about word boundary in RE
by hdb (Prior) on Apr 02, 2013 at 07:24 UTC

    The whitespace between the words prevents the match.

    my $test = "asd asd"; if ($test =~ /(.+\b)\s\1/) { print "Found $1 repeated\n"; }
      Isn't there a word boundary before the whitespace? If (.+\b) already match, why I need the \s? I thought the output will be the first "asd" but not.
Re: about word boundary in RE (use re 'debug')
by Anonymous Monk on Apr 02, 2013 at 07:36 UTC
    Add  use re 'debug'; to that short file, and watch the regex engine do its thing in your console
      Good but seems not easily readable to me. If (.+\b) already match, why I need the \s?

        \b is a zero width match. It does need the space to recognize a word boundary, but it does not consume it. And therefore you need to add a space to your pattern.

        Good but seems not easily readable to me.

        In that case, use a shorter string, associate the numbers from "Final program" against those on the right side , like 1: OPEN1 (3)

        $ perl -Mre=debug -le " q/a a/ =~ /(.\b)\1/ " Compiling REx "(.\b)\1" Final program: 1: OPEN1 (3) 3: REG_ANY (4) 4: BOUND (5) 5: CLOSE1 (7) 7: REF1 (9) 9: END (0) minlen 1 Matching REx "(.\b)\1" against "a a" 0 <> <a a> | 1:OPEN1(3) 0 <> <a a> | 3:REG_ANY(4) 1 <a> < a> | 4:BOUND(5) 1 <a> < a> | 5:CLOSE1(7) 1 <a> < a> | 7:REF1(9) failed... 1 <a> < a> | 1:OPEN1(3) 1 <a> < a> | 3:REG_ANY(4) 2 <a > <a> | 4:BOUND(5) 2 <a > <a> | 5:CLOSE1(7) 2 <a > <a> | 7:REF1(9) failed... 2 <a > <a> | 1:OPEN1(3) 2 <a > <a> | 3:REG_ANY(4) 3 <a a> <> | 4:BOUND(5) 3 <a a> <> | 5:CLOSE1(7) 3 <a a> <> | 7:REF1(9) failed... 3 <a a> <> | 1:OPEN1(3) 3 <a a> <> | 3:REG_ANY(4) failed... Match failed Freeing REx: "(.\b)\1"

        Compare against a simpler pattern like

        $ perl -Mre=debug -le " q/aa/ =~ /a\b/ " Compiling REx "a\b" Final program: 1: EXACT <a> (3) 3: BOUND (4) 4: END (0) anchored "a" at 0 (checking anchored) minlen 1 Guessing start of match in sv for REx "a\b" against "aa" Found anchored substr "a" at offset 0... Guessed: match at offset 0 Matching REx "a\b" against "aa" 0 <> <aa> | 1:EXACT <a>(3) 1 <a> <a> | 3:BOUND(4) failed... 1 <a> <a> | 1:EXACT <a>(3) 2 <aa> <> | 3:BOUND(4) 2 <aa> <> | 4:END(0) Match successful! Freeing REx: "a\b"

        Then check the definition of \b in perlre#Assertions, perlrequick

        Perl defines the following zero-width assertions: The word anchor \b matches a boundary between a word character and a non-word character \w\W or \W\w
        $x = "Housecat catenates house and cat"; $x =~ /\bcat/; # matches cat in 'catenates' $x =~ /cat\b/; # matches cat in 'housecat' $x =~ /\bcat\b/; # matches 'cat' at end of string

        Basically your pattern can never match, just like this perl -Mre=debug -le " q/aa/ =~ /a\ba/ "

        there can never be a word boundary within a word by definition

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1026602]
Approved by 2teez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (11)
As of 2015-07-06 07:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (70 votes), past polls