Re: Am I using Text::Balanced correctly? Speed issues.

I can't provide a precise answer, but that recursion passing $page by value is probably a good place to start looking. But more importantly, if in fact your task is a simple as stated, why not just use a progressive regex ? Here's my quick hack at it, and it runs quite quickly (Note I haven't actually validated the output string, other than noting the length change reported at exit):

use Carp;

use strict;
use warnings;

my $page = join('',
"This is some filler.\n" x 20000,
"<SECTION[test1]>\n",
"This is some filler.\n" x 20000,
"</SECTION>\n",
"This is some filler.\n" x 20000,
"<SECTION[test2]>\n",
"This is some filler.\n" x 20000,
"</SECTION>\n",
"This is some filler.\n" x 20000);

print "Calling process_section.  Length of page=[" . length($page) ."]
+\n";
my $newpage = process_section(\$page, {test1 => 1});
print "Done.  Length of newpage=[" . length($newpage) ."]\n";

sub process_section {
    my ($page, $hashref) = @_;

    my $return = '';
    
    while ($$page=~/\G(.*?)<SECTION\[([^\]]+)\]>\n?/igcs) {
        $return .= $1;
        my $tag = $2;
        my $post = pos($$page);
        if ($$page=~/\G(.*?)<\/SECTION>\n?/igcs) {
            $return .= $1 
                if exists $hashref->{$tag};
            next;
        }

        my $excerpt = substr($$page, $post, 100);
        print STDERR "\n";
        Carp::carp("Warning: Unbalanced SECTION tags.  Fix the templat
+e! Error near: $excerpt\n");
        pos($$page) = $post;
    }

    $return .= substr($$page, pos($$page))
        if (pos($$page) < length($$page));
    return $return;
}
[download]

Note that I've changed to use a ref to the $page, and eliminated recursion.

Text::Balanced is a great tool for dealing with complex things like parsing Perl code, but if you can get by with a good old RE, I'd say its probably a much better solution.

Perl Contrarian & SQL fanboy

Comment on Re: Am I using Text::Balanced correctly? Speed issues. Download Code


go ahead... be a heretic
	PerlMonks