Re: Am I using Text::Balanced correctly? Speed issues.

by renodino (Curate)
on Aug 31, 2007 at 20:16 UTC

in reply to Am I using Text::Balanced correctly? Speed issues.

I can't provide a precise answer, but that recursion passing $page by value is probably a good place to start looking. But more importantly, if in fact your task is a simple as stated, why not just use a progressive regex ? Here's my quick hack at it, and it runs quite quickly (Note I haven't actually validated the output string, other than noting the length change reported at exit):
use Carp; use strict; use warnings; my $page = join('', "This is some filler.\n" x 20000, "<SECTION[test1]>\n", "This is some filler.\n" x 20000, "</SECTION>\n", "This is some filler.\n" x 20000, "<SECTION[test2]>\n", "This is some filler.\n" x 20000, "</SECTION>\n", "This is some filler.\n" x 20000); print "Calling process_section. Length of page=[" . length($page) ."] +\n"; my $newpage = process_section(\$page, {test1 => 1}); print "Done. Length of newpage=[" . length($newpage) ."]\n"; sub process_section { my ($page, $hashref) = @_; my $return = ''; while ($$page=~/\G(.*?)<SECTION\[([^\]]+)\]>\n?/igcs) { $return .= $1; my $tag = $2; my $post = pos($$page); if ($$page=~/\G(.*?)<\/SECTION>\n?/igcs) { $return .= $1 if exists $hashref->{$tag}; next; } my $excerpt = substr($$page, $post, 100); print STDERR "\n"; Carp::carp("Warning: Unbalanced SECTION tags. Fix the templat +e! Error near: $excerpt\n"); pos($$page) = $post; } $return .= substr($$page, pos($$page)) if (pos($$page) < length($$page)); return $return; }
Note that I've changed to use a ref to the $page, and eliminated recursion.

Text::Balanced is a great tool for dealing with complex things like parsing Perl code, but if you can get by with a good old RE, I'd say its probably a much better solution.

