Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Text::Balanced with nested / custom brackets

by Anonymous Monk
on Sep 07, 2006 at 20:35 UTC ( #571798=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I've been struggling with this for a couple days, though I don't think the problem is very difficult. Within: 'this is a [[link with a [nested]]]' I would like to extract the entire parent [..]. There will be multiple instances, not all with nested brackets. What is the correct code syntax? I've tried all kinds of things without much luck. I'm assuming it's with extract_multiple and an extract ref, but can't figure out what exactly to do: my @data = extract_multiple( $text, ???? ); FYI, this is all toward parsing raw wiki text. Thanks in advance.
  • Comment on Text::Balanced with nested / custom brackets

Replies are listed 'Best First'.
Re: Text::Balanced with nested / custom brackets
by ikegami (Pope) on Sep 07, 2006 at 21:48 UTC
    As far as I can tell, Text::Balanced deals with single-character delimiters, whereas your delimiter has two. You might have to resort to using a regexp.
    my $extractor; # Must be a seperate statement. $extractor = qr/ \[\[ (?: (?: (?! \[\[ | \]\] ) . )+ | (??{ $extractor }) )+ \]\] /x; my @links = $text =~ /$extractor/g;

    Optimized (I think):

    my $extractor; # Must be a seperate statement. $extractor = qr/ \[\[ (?> (?: (?: (?> [^\[\]]+ ) | \[ (?! \[ ) | \] (?! \] ) ) | (??{ $extractor }) )+ ) \]\] /x; my @links = $text =~ /$extractor/g;


      Thank you for those regexes! I'll play around with them to see if I can get myself moving.

      In the long run, I'd still like to know if Text::Balanced can be massaged into dealing with this situation. It does deal with <tags> and such..

        The function to extract tagged data can indeed be used.

        my @links; my $extractor = gen_extract_tagged('[[', ']]', qr/(?:(?!\[\[).)*/); for (;;) { (my $link, $text) = $extractor->($text); last if not defined $link; push(@links, $link); }


      It's worth noting that ikegami's code is essentially a derivation of code in perlre for matching balanced parens:
      $re = qr{ \( (?: (?> [^()]+ ) # Non-parens without backtracking | (??{ $re }) # Group with matching parens )* \) }x;
Re: Text::Balanced with nested / custom brackets
by Skeeve (Vicar) on Sep 07, 2006 at 21:03 UTC
    Did you try to search for a module on CPAN? Maybe there is something for parsing wiki text

      Yes, I did. All of the Wiki modules I found on CPAN are terrible. There doesn't seem to be any one flexible/powerful parser.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://571798]
Approved by GrandFather
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2018-05-22 12:42 GMT
Find Nodes?
    Voting Booth?