Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Eternal question of parsing parentheses

by Anonymous Monk
on Oct 24, 2009 at 12:59 UTC ( #803034=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

This must be a common question, but I can't find any answer.

In a string containing nested parentheses, I want to find the first parenthesis, in order to evaluate things for that before the rest. So for $_='a(b(c(d)(e))f)g(h)((i)j)'; the result should be b(c(d)(e))f.

This can be done the sad and boring way by splitting the string and going through it, counting parenthesis signs as you go. But surely there must be some clever regex to do it?

I came up with one solution, which looks wonderfully cryptic:

while(/^[^\(]*\([^\)]*\(/){s/\(([^\(\)]*)\)/\[\1\]/} s/.*?\((.*?)\).*/\1/; tr/\[\]/\(\)/;

While the first '(' is follwed by another '(' before any ')', i.e. the first parenthesis has inner parentheses, find the first '(' which is NOT followed by another '(' before the ')', i.e. the first innermost parenthesis, and replace that with [...]. Then extract the first remaining parenthesis, and replace all [ ] with ( ).

Is this a good idea? Can you improve it? Is there a better way?

Comment on Eternal question of parsing parentheses
Select or Download Code
Re: Eternal question of parsing parentheses
by JavaFan (Canon) on Oct 24, 2009 at 13:23 UTC
    use Regexp::Common qw /balanced/; /($RE{balanced}{-parens=>’()’})/ and print $1;
Re: Eternal question of parsing parentheses
by ikegami (Pope) on Oct 24, 2009 at 16:39 UTC
Re: Eternal question of parsing parentheses
by LanX (Canon) on Oct 24, 2009 at 17:44 UTC
    Eternal answer RTFM¹ ! :)

    perldoc perlre ( >5.10) and searching for "recursive" brought the following code, where I only had to exchange "foo" with \w*

    $_='a(b(c(d)(e))f)g(h)((i)j)'; $re = qr{ ( # paren group 1 (full function) \w* ( # paren group 2 (parens) \( ( # paren group 3 (contents of parens) (?: (?> [^()]+ ) # Non-parens without backtracking | (?2) # Recurse to start of paren group 2 )* ) \) ) ) }x; @matches=/$re/; print "@matches";

    perl /tmp/ a(b(c(d)(e))f) (b(c(d)(e))f) b(c(d)(e))f Compilation finished at Sat Oct 24 19:42:58

    Cheers Rolf

    (¹) SCNR 8)

    UPDATE: untabified code.

      For those still limited to 5.8 and before, see also the discussion of the
       (??{ code }) construct under Extended Patterns (just before the discussion of
      (?PARNO)) in perlre.

      I simplified the code, and replaced tr/()/<>/ to make it more readable:

      $_='a<b<c<d><e>>f>g<h><<i>j>'; $re = qr{ < # anchor at first paren as wanted ( # paren group 1 (?: (?> [^<>]+) # Non-parens without backtracking | < (?1) # Recurse to start of paren group 1 > )* ) }x; /$re/; print $1;
      perl /tmp/ b<c<d><e>>f

      Cheers Rolf

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://803034]
Approved by Corion
Front-paged by toolic
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (12)
As of 2014-12-21 14:31 GMT
Find Nodes?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?

    Results (106 votes), past polls