http://www.perlmonks.org?node_id=1026613


in reply to Re: about word boundary in RE (use re 'debug')
in thread about word boundary in RE

Good but seems not easily readable to me. If (.+\b) already match, why I need the \s?
  • Comment on Re^2: about word boundary in RE (use re 'debug')

Replies are listed 'Best First'.
Re^3: about word boundary in RE (use re 'debug')
by hdb (Monsignor) on Apr 02, 2013 at 08:00 UTC

    \b is a zero width match. It does need the space to recognize a word boundary, but it does not consume it. And therefore you need to add a space to your pattern.

      Sorry my mistake. I didn't understand that the \1 repeat the pattern. I just neglcted it. So it's clear to me now. The case is from website. Thanks!

Re^3: about word boundary in RE (use re 'debug')
by Anonymous Monk on Apr 02, 2013 at 08:21 UTC

    Good but seems not easily readable to me.

    In that case, use a shorter string, associate the numbers from "Final program" against those on the right side , like 1: OPEN1 (3)

    $ perl -Mre=debug -le " q/a a/ =~ /(.\b)\1/ " Compiling REx "(.\b)\1" Final program: 1: OPEN1 (3) 3: REG_ANY (4) 4: BOUND (5) 5: CLOSE1 (7) 7: REF1 (9) 9: END (0) minlen 1 Matching REx "(.\b)\1" against "a a" 0 <> <a a> | 1:OPEN1(3) 0 <> <a a> | 3:REG_ANY(4) 1 <a> < a> | 4:BOUND(5) 1 <a> < a> | 5:CLOSE1(7) 1 <a> < a> | 7:REF1(9) failed... 1 <a> < a> | 1:OPEN1(3) 1 <a> < a> | 3:REG_ANY(4) 2 <a > <a> | 4:BOUND(5) 2 <a > <a> | 5:CLOSE1(7) 2 <a > <a> | 7:REF1(9) failed... 2 <a > <a> | 1:OPEN1(3) 2 <a > <a> | 3:REG_ANY(4) 3 <a a> <> | 4:BOUND(5) 3 <a a> <> | 5:CLOSE1(7) 3 <a a> <> | 7:REF1(9) failed... 3 <a a> <> | 1:OPEN1(3) 3 <a a> <> | 3:REG_ANY(4) failed... Match failed Freeing REx: "(.\b)\1"

    Compare against a simpler pattern like

    $ perl -Mre=debug -le " q/aa/ =~ /a\b/ " Compiling REx "a\b" Final program: 1: EXACT <a> (3) 3: BOUND (4) 4: END (0) anchored "a" at 0 (checking anchored) minlen 1 Guessing start of match in sv for REx "a\b" against "aa" Found anchored substr "a" at offset 0... Guessed: match at offset 0 Matching REx "a\b" against "aa" 0 <> <aa> | 1:EXACT <a>(3) 1 <a> <a> | 3:BOUND(4) failed... 1 <a> <a> | 1:EXACT <a>(3) 2 <aa> <> | 3:BOUND(4) 2 <aa> <> | 4:END(0) Match successful! Freeing REx: "a\b"

    Then check the definition of \b in perlre#Assertions, perlrequick

    Perl defines the following zero-width assertions: The word anchor \b matches a boundary between a word character and a non-word character \w\W or \W\w
    $x = "Housecat catenates house and cat"; $x =~ /\bcat/; # matches cat in 'catenates' $x =~ /cat\b/; # matches cat in 'housecat' $x =~ /\bcat\b/; # matches 'cat' at end of string

    Basically your pattern can never match, just like this perl -Mre=debug -le " q/aa/ =~ /a\ba/ "

    there can never be a word boundary within a word by definition

      Thanks for your patience. I think I need a little more time on understand the lines lately.