Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Recursive regex

by LanX (Saint)
on Feb 05, 2016 at 18:16 UTC ( [id://1154502]=note: print w/replies, xml ) Need Help??


in reply to Recursive regex

$np matches everything between parens \( and \) which is

  • either not a paren in [^()]+
  • or something within parens (+ surrounding parens)
in the latter case it descends into recursion and (?>...) avoids backtracking

indentation helps:

use re 'debug'; $np = qr{ \( (?: (?> [^()]+ ) # Non–parens without backtracking | (??{ $np }) # Group with matching parens )* \) }x; $funpat = qr/\w+$np/; "fun(1,(2),5)" =~ /^$funpat$/; #Matches!"

and you might find this re-debug output helpful (just play with the input to have more concise informations)

Compiling REx "%n \(%n (?:%n (?> [^() +]+ ) "... Final program: 1: EXACT <(> (3) 3: CURLYX[0] {0,32767} (28) 5: BRANCH (22) 6: SUSPEND (27) 8: PLUS (20) 9: ANYOF[\x00-'*-\xff][{unicode_all}] (0) 20: SUCCEED (0) 21: TAIL (26) 22: BRANCH (FAIL) 23: LOGICAL[2] (24) 24: EVAL (27) 26: TAIL (27) 27: WHILEM[1/1] (0) 28: NOTHING (29) 29: EXACT <)> (31) 31: END (0) anchored "(" at 0 floating ")" at 1..2147483647 (checking floating) mi +nlen 2 with eval Compiling REx "\w+(?^x:%n \(%n (?:%n +(?> ["... Final program: 1: PLUS (3) 2: ALNUM (0) 3: EXACT <(> (5) 5: CURLYX[0] {0,32767} (30) 7: BRANCH (24) 8: SUSPEND (29) 10: PLUS (22) 11: ANYOF[\x00-'*-\xff][{unicode_all}] (0) 22: SUCCEED (0) 23: TAIL (28) 24: BRANCH (FAIL) 25: LOGICAL[2] (26) 26: EVAL (29) 28: TAIL (29) 29: WHILEM[1/1] (0) 30: NOTHING (31) 31: EXACT <)> (33) 33: END (0) floating "(" at 1..2147483647 (checking floating) stclass ALNUM minlen + 3 with eval Compiling REx "^(?^:\w+(?^x:%n \(%n (?:%n + "... Final program: 1: BOL (2) 2: PLUS (4) 3: ALNUM (0) 4: EXACT <(> (6) 6: CURLYX[0] {0,32767} (31) 8: BRANCH (25) 9: SUSPEND (30) 11: PLUS (23) 12: ANYOF[\x00-'*-\xff][{unicode_all}] (0) 23: SUCCEED (0) 24: TAIL (29) 25: BRANCH (FAIL) 26: LOGICAL[2] (27) 27: EVAL (30) 29: TAIL (30) 30: WHILEM[1/1] (0) 31: NOTHING (32) 32: EXACT <)> (34) 34: EOL (35) 35: END (0) floating ")"$ at 2..2147483647 (checking floating) anchored(BOL) minle +n 3 with eval Guessing start of match in sv for REx "^(?^:\w+(?^x:%n \(%n + (?:%n "... against "fun(1,(2),5)" Found floating substr ")"$ at offset 11... Guessed: match at offset 0 Matching REx "^(?^:\w+(?^x:%n \(%n (?:%n + "... against "fun(1,(2),5)" 0 <> <fun(1,(2),> | 1:BOL(2) 0 <> <fun(1,(2),> | 2:PLUS(4) ALNUM can match 3 times out of 21474 +83647... 3 <fun> <(1,(2),5)> | 4: EXACT <(>(6) 4 <fun(> <1,(2),5)> | 6: CURLYX[0] {0,32767}(31) 4 <fun(> <1,(2),5)> | 30: WHILEM[1/1](0) whilem: matched 0 out of 0..3276 +7 4 <fun(> <1,(2),5)> | 8: BRANCH(25) 4 <fun(> <1,(2),5)> | 9: SUSPEND(30) 4 <fun(> <1,(2),5)> | 11: PLUS(23) ANYOF[\x00-'*-\xff][{unico +de_all}] can match 2 times out of 2147483647... 6 <fun(1,> <(2),5)> | 23: SUCCEED(0) subpattern success... 6 <fun(1,> <(2),5)> | 30: WHILEM[1/1](0) whilem: matched 1 out of 0.. +32767 6 <fun(1,> <(2),5)> | 8: BRANCH(25) 6 <fun(1,> <(2),5)> | 9: SUSPEND(30) 6 <fun(1,> <(2),5)> | 11: PLUS(23) ANYOF[\x00-'*-\xff][{u +nicode_all}] can match 0 times out of 2147483647... failed... failed... 6 <fun(1,> <(2),5)> | 25: BRANCH(29) 6 <fun(1,> <(2),5)> | 26: LOGICAL[2](27) 6 <fun(1,> <(2),5)> | 27: EVAL(30) Matching embedded REx "%n \(%n (?:%n +(?> [^()]+ ) "... against "(2),5)" 6 <fun(1,> <(2),5)> | 1: EXACT <(>(3) 7 <fun(1,(> <2),5)> | 3: CURLYX[0] {0,32767}(28 +) 7 <fun(1,(> <2),5)> | 27: WHILEM[1/1](0) whilem: matched 0 ou +t of 0..32767 7 <fun(1,(> <2),5)> | 5: BRANCH(22) 7 <fun(1,(> <2),5)> | 6: SUSPEND(27) 7 <fun(1,(> <2),5)> | 8: PLUS(20) ANYOF[\x00-'*- +\xff][{unicode_all}] can match 1 times out of 2147483647... 8 <fun(1,(2> <),5)> | 20: SUCCEED(0) subpattern s +uccess... 8 <fun(1,(2> <),5)> | 27: WHILEM[1/1](0) whilem: matched +1 out of 0..32767 8 <fun(1,(2> <),5)> | 5: BRANCH(22) 8 <fun(1,(2> <),5)> | 6: SUSPEND(27) 8 <fun(1,(2> <),5)> | 8: PLUS(20) ANYOF[\x00 +-'*-\xff][{unicode_all}] can match 0 times out of 2147483647... failed... failed... 8 <fun(1,(2> <),5)> | 22: BRANCH(26) 8 <fun(1,(2> <),5)> | 23: LOGICAL[2](2 +4) 8 <fun(1,(2> <),5)> | 24: EVAL(27) Matching embedded REx "%n \(%n (?:%n +(?> [^()]+ ) "... against "),5)" 8 <fun(1,(2> <),5)> | 1: EXACT <(>( +3) failed... BRANCH failed. +.. whilem: failed, +trying continuation... 8 <fun(1,(2> <),5)> | 28: NOTHING(29) 8 <fun(1,(2> <),5)> | 29: EXACT <)>(31) 9 <fun(1,(2)> <,5)> | 31: END(0) EVAL trying ta +il ... 0 9 <fun(1,(2)> <,5)> | 30: WHILEM[1/1]( +0) whilem: matc +hed 2 out of 0..32767 9 <fun(1,(2)> <,5)> | 8: BRANCH(25) 9 <fun(1,(2)> <,5)> | 9: SUSPEND( +30) 9 <fun(1,(2)> <,5)> | 11: PLUS(2 +3) ANYOF[ +\x00-'*-\xff][{unicode_all}] can match 2 times out of 2147483647... 11 <fun(1,(2),5> <)> | 23: SUCC +EED(0) subp +attern success... 11 <fun(1,(2),5> <)> | 30: WHILEM[1 +/1](0) whilem: +matched 3 out of 0..32767 11 <fun(1,(2),5> <)> | 8: BRANCH +(25) 11 <fun(1,(2),5> <)> | 9: SUSP +END(30) 11 <fun(1,(2),5> <)> | 11: PL +US(23) AN +YOF[\x00-'*-\xff][{unicode_all}] can match 0 times out of 2147483647. +.. fa +iled... fail +ed... 11 <fun(1,(2),5> <)> | 25: BRANCH +(29) 11 <fun(1,(2),5> <)> | 26: LOGI +CAL[2](27) 11 <fun(1,(2),5> <)> | 27: EVAL +(30) Matching embedded REx "%n \(%n (?:%n +(?> [^()]+ ) "... against ")" 11 <fun(1,(2),5> <)> | 1: EX +ACT <(>(3) fa +iled... BRANCH + failed... whilem: +failed, trying continuation... 11 <fun(1,(2),5> <)> | 31: NOTHIN +G(32) 11 <fun(1,(2),5> <)> | 32: EXACT +<)>(34) 12 <fun(1,(2),5)> <> | 34: EOL(35 +) 12 <fun(1,(2),5)> <> | 35: END(0) Match successful! Freeing REx: "%n \(%n (?:%n (?> [^()] ++ ) "... Freeing REx: "\w+(?^x:%n \(%n (?:%n ( +?> ["...

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1154502]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2024-04-19 07:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found