7stud
Considering your other post ( which might or might not have stemmed from this ) I thought I would update this thread with the final solution that I used ( also for anyone else who might care ).
I found that, considering {{Infobox was not the only chunk I needed, I was taking a huge performance hit. To avoid this I changed to a single sweep of the ( long ) text chunk as follows - I have removed the other parts that I extracted in the same sweep so as to stick to the OP topic.
use strict ;
use warnings ;
use Data::Dump qw( dump ) ;
my $text = <<'END_OF_STRING';
{{Infobox
text text text
{{text text text
text {{text text}}
text}}
{{{text {{text }}
text }}}
END}}
blah blah
blah blah
blah blah
{{Infobox
text1 text1 text1
{{text1 text1 text1
text1 {{text1 text1}}
text1}}
{{{text1 {{text1 }}
text1 }}}
}}
{{Infobox one}}
END_OF_STRING
my $box_contents = _get_info_boxes( $text ) ;
dump( $box_contents ) ;
exit;
sub _get_info_boxes {
my $text = shift ;
my @info_box_contents ;
my $in_info_box ;
my $this_info_box_content = "" ;
my $bracket_count = 0 ;
foreach my $line ( split( /\n/, $text ) ) {
unless( $in_info_box ) {
next unless( $line =~ /{{Infobox/ ) ;
$in_info_box = 1 ;
}
$this_info_box_content .= $line . "\n" ;
my $open_count = ( $line =~ tr/{// ) ;
my $close_count = ( $line =~ tr/}// ) ;
$bracket_count = $bracket_count + $open_count - $close_count ;
if( $bracket_count == 0 ) {
push @info_box_contents, $this_info_box_content ;
$this_info_box_content = "" ;
$in_info_box = 0 ;
$bracket_count = 0 ;
}
}
return \@info_box_contents ;
}