Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Text::Balanced with nested / custom brackets

by Anonymous Monk
on Sep 07, 2006 at 20:35 UTC ( #571798=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I've been struggling with this for a couple days, though I don't think the problem is very difficult. Within: 'this is a [[link with a [nested]]]' I would like to extract the entire parent [..]. There will be multiple instances, not all with nested brackets. What is the correct code syntax? I've tried all kinds of things without much luck. I'm assuming it's with extract_multiple and an extract ref, but can't figure out what exactly to do: my @data = extract_multiple( $text, ???? ); FYI, this is all toward parsing raw wiki text. Thanks in advance.

Comment on Text::Balanced with nested / custom brackets
Reaped: Re: Text::Balanced with nested / custom brackets
by NodeReaper (Curate) on Sep 07, 2006 at 20:37 UTC
Re: Text::Balanced with nested / custom brackets
by Skeeve (Vicar) on Sep 07, 2006 at 21:03 UTC
    Did you try to search for a module on CPAN? Maybe there is something for parsing wiki text

    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
      Yes, I did. All of the Wiki modules I found on CPAN are terrible. There doesn't seem to be any one flexible/powerful parser.
Re: Text::Balanced with nested / custom brackets
by ikegami (Pope) on Sep 07, 2006 at 21:48 UTC
    As far as I can tell, Text::Balanced deals with single-character delimiters, whereas your delimiter has two. You might have to resort to using a regexp.
    my $extractor; # Must be a seperate statement. $extractor = qr/ \[\[ (?: (?: (?! \[\[ | \]\] ) . )+ | (??{ $extractor }) )+ \]\] /x; my @links = $text =~ /$extractor/g;

    Optimized (I think):

    my $extractor; # Must be a seperate statement. $extractor = qr/ \[\[ (?> (?: (?: (?> [^\[\]]+ ) | \[ (?! \[ ) | \] (?! \] ) ) | (??{ $extractor }) )+ ) \]\] /x; my @links = $text =~ /$extractor/g;

    Tested.

      Thank you for those regexes! I'll play around with them to see if I can get myself moving.

      In the long run, I'd still like to know if Text::Balanced can be massaged into dealing with this situation. It does deal with <tags> and such..

        The function to extract tagged data can indeed be used.

        my @links; my $extractor = gen_extract_tagged('[[', ']]', qr/(?:(?!\[\[).)*/); for (;;) { (my $link, $text) = $extractor->($text); last if not defined $link; push(@links, $link); }

        Untested.

      It's worth noting that ikegami's code is essentially a derivation of code in perlre for matching balanced parens:
      $re = qr{ \( (?: (?> [^()]+ ) # Non-parens without backtracking | (??{ $re }) # Group with matching parens )* \) }x;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://571798]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (8)
As of 2015-07-05 10:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (61 votes), past polls