http://www.perlmonks.org?node_id=186467

You are all familiar how to match nested balanced parenthesis. But that expressions is easily extendable to larger start and finish markers. Here's how you would match a nested begin/end block.
$re = qr /begin # Start with 'begin', (?: # Followed by (?>[^be]+) # Not a b or e, many times, |b(?!egin) # Or a b not starting 'begin', |e(?!nd) # Or an e not starting 'end', |(??{ $re }) # Or a balanced begin/end block )* # zero or more times end/x; # And an 'end'.

Replies are listed 'Best First'.
Re: Matching nested begin/ends
by PodMaster (Abbot) on Jul 31, 2002 at 12:17 UTC
    That doesn't work too well on my Win2000 machine. You need to change the delimiter to avoid
    Unmatched ( before HERE mark in regex m/begin # Start w +ith 'begin', ( << HERE ?: # Followed by (?>[^be]+) # Not a b or e, many times, |b(?!egin) # Or a b not starting 'begin', |e(?!nd) # Or an e not starting 'end', |(??{ $re }) # Or a balanced begin/ at fudge line + 6.
    A pair of {} works fine.

    ____________________________________________________
    ** The Third rule of perl club is a statement of fact: pod is sexy.

      Whoops.

      I added the comments after testing the code. The / in the comment messes things up.

      Abigail

Re: Matching nested begin/ends
by I0 (Priest) on Aug 01, 2002 at 00:27 UTC
    ($re=$_)=~s/((begin)|(end)|.)/${['(','']}[!$2]\Q$1\E${[')','']}[!$3]/g +s; $re = join'|',map quotemeta,eval{/$re/}; $re = qr/$re/;
Re: Matching nested begin/ends
by jryan (Vicar) on Jul 31, 2002 at 22:44 UTC
    Very slick regex, but instead of using lookaheads, why not just let backtracking work its magic? :)
    $re = qr /begin (?: (?>[^be]*) |(??{ $re }) | [be] )* end/x;
    Note: Abigail's will be faster, but sorry, I just couldn't resist the joke. ;)
      Yours is wrong.
      #!/usr/bin/perl use strict; use warnings 'all'; use vars qw /$re/; $re = qr /begin (?: (?>[^be])* |(??{ $re }) | [be] )* end/x; sub pass {local $_ = shift; print /^$re$/ ? "ok\n" : "not ok: $_\n"} sub fail {local $_ = shift; print ! /^$re$/ ? "ok\n" : "not ok: $_\n"} pass 'begin end'; fail 'begin en'; fail 'begin nd'; pass 'begin begin end end'; pass 'beginend'; pass 'beginbeginbeginendendend'; pass 'begin begin end begin begin end begin end end end'; fail 'begin begin end begin egin end begin end end end'; fail 'begin end begin end'; __END__ ok ok ok ok ok ok ok not ok: begin begin end begin egin end begin end end end not ok: begin end begin end
      It matches strings that shouldn't be matched.

      Abigail

        It is not.

        We simply test differently; I tested with something like this (using your input):

        my$re = qr/ begin (?: (?> [^be]* ) |(??{ $re }) | [be] )* end /x; foreach (<DATA>) { chomp; my @matches = $_ =~ /($re)/g; print qq(For "$_":\n\t); print (@matches ? join("*",@matches) : "no matches", "\n"); } __DATA__ begin end begin en begin nd begin begin end end beginend beginbeginbeginendendend begin begin end begin begin end begin end end end begin begin end begin egin end begin end end end begin end begin end

        Which prints:

        For "begin end": begin end For "begin en": no matches For "begin nd": no matches For "begin begin end end": begin begin end end For "beginend": beginend For "beginbeginbeginendendend": beginbeginbeginendend For "begin begin end begin begin end begin end end end": begin begin end begin begin end begin end end For "begin begin end begin egin end begin end end end": begin begin end begin egin end begin end end For "begin end begin end": begin end*begin end