pelagic has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks!
pelagic
I got this case of an unefficient regexp handling when matching strings in a large file:
To look for ONE string takes 2 seconds
while looking for TWO strings takes 79 seconds. Here is the code:
The result says:use strict; use Benchmark; my $file = shift || 'no_file'; timethese( 1, { 'one_string' => sub { one_string() }, 'two_string' => sub { two_string() }, } ); sub one_string { my $filter = '00901808'; my $re = qr/$filter/o; my @matched; open (my $FH, "<$file"); while (my $rec = <$FH>) { if ( $rec =~ $re) { push @matched, $rec; } } close $FH; } sub two_string { my $filter = '00901808|87654321'; my $re = qr/$filter/o; my @matched; open (my $FH, "<$file"); while (my $rec = <$FH>) { if ( $rec =~ $re) { push @matched, $rec; } } close $FH; } __END__
# perl bench_regexp 100000lines.92MB.file Benchmark: timing 1 iterations of one_string, two_string... one_string: 2 wallclock secs ( 1.68 usr + 0.42 sys = 2.10 CPU) @ 0 +.48/s (n=1) (warning: too few iterations for a reliable count) two_string: 77 wallclock secs (76.13 usr + 0.59 sys = 76.72 CPU) @ 0 +.01/s (n=1) (warning: too few iterations for a reliable count)
pelagic
|
---|
Back to
Seekers of Perl Wisdom