in reply to
Comparing Regular Expressions
Thanks to everyone.
I did my own benchmark based on your suggestions, and found the following:
#!perl -w
use strict;
use warnings;
use Benchmark 'cmpthese';
my $test = join '', map { ' ' x (0 == int rand 9) . ',' . ' ' x ((0 =
+= int rand 3) * int rand 3) . $_ } 0 .. 1_000;
cmpthese(-10,{
'test1' => sub { my $f = $test; $f =~ s/\s+,\s+|\s+,|,\s+/,/g; },
'testA' => sub { my $f = $test; $f =~ s/\s+,\s+/,/g; $f =~ s/\s+,/,/
+g; $f =~ s/,\s+/,/g; },
'test2' => sub { my $f = $test; $f =~ s/\s+,\s*|,\s+/,/g; },
'testB' => sub { my $f = $test; $f =~ s/\s+,\s*/,/g; $f =~ s/,\s+/,/
+g; },
'test3' => sub { my $f = $test; $f =~ s/\s*,\s*/,/g; },
});
Results using ActivePerl 5.10.0 on WinXP-SP3:
1_000:
Rate test1 test2 test3 testA testB
test1 974/s -- -22% -56% -66% -71%
test2 1256/s 29% -- -43% -56% -63%
test3 2221/s 128% 77% -- -22% -35%
testA 2852/s 193% 127% 28% -- -16%
testB 3412/s 250% 172% 54% 20% --
10_000:
Rate test1 test2 test3 testA testB
test1 79/s -- -24% -62% -72% -77%
test2 103/s 31% -- -50% -63% -70%
test3 205/s 161% 99% -- -26% -39%
testA 277/s 252% 169% 35% -- -18%
testB 339/s 331% 228% 65% 22% --
100_000:
Rate test1 test2 test3 testA testB
test1 6.5/s -- -26% -67% -76% -81%
test2 8.9/s 36% -- -55% -68% -74%
test3 19.6/s 200% 122% -- -28% -42%
testA 27.3/s 319% 209% 39% -- -19%
testB 33.6/s 414% 279% 71% 23% --
Results are listed from slower to faster regexp(s).
Tests 1 and 3 are regexps from my original question, and the intermediate 2's regexp was supplied by ig. A and B are 1 and 2 respectively, but splitted on alternations (replace in phases).
Possible quick conclusions:
- It's better to use backtracking than alternations.
- It's better to split a regexp with an alternation whenever is possible.
- Itīs better to replace a pattern by the same thing instead of doing some effort to avoid that.
Comments?