Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Perl: the Markov chain saw
 
PerlMonks  

Re: Comparing Regular Expressions

by vitoco (Pilgrim)
on Sep 01, 2009 at 15:44 UTC ( #792692=note: print w/ replies, xml ) Need Help??


in reply to Comparing Regular Expressions

Thanks to everyone.

I did my own benchmark based on your suggestions, and found the following:

#!perl -w use strict; use warnings; use Benchmark 'cmpthese'; my $test = join '', map { ' ' x (0 == int rand 9) . ',' . ' ' x ((0 = += int rand 3) * int rand 3) . $_ } 0 .. 1_000; cmpthese(-10,{ 'test1' => sub { my $f = $test; $f =~ s/\s+,\s+|\s+,|,\s+/,/g; }, 'testA' => sub { my $f = $test; $f =~ s/\s+,\s+/,/g; $f =~ s/\s+,/,/ +g; $f =~ s/,\s+/,/g; }, 'test2' => sub { my $f = $test; $f =~ s/\s+,\s*|,\s+/,/g; }, 'testB' => sub { my $f = $test; $f =~ s/\s+,\s*/,/g; $f =~ s/,\s+/,/ +g; }, 'test3' => sub { my $f = $test; $f =~ s/\s*,\s*/,/g; }, });

Results using ActivePerl 5.10.0 on WinXP-SP3:

1_000: Rate test1 test2 test3 testA testB test1 974/s -- -22% -56% -66% -71% test2 1256/s 29% -- -43% -56% -63% test3 2221/s 128% 77% -- -22% -35% testA 2852/s 193% 127% 28% -- -16% testB 3412/s 250% 172% 54% 20% -- 10_000: Rate test1 test2 test3 testA testB test1 79/s -- -24% -62% -72% -77% test2 103/s 31% -- -50% -63% -70% test3 205/s 161% 99% -- -26% -39% testA 277/s 252% 169% 35% -- -18% testB 339/s 331% 228% 65% 22% -- 100_000: Rate test1 test2 test3 testA testB test1 6.5/s -- -26% -67% -76% -81% test2 8.9/s 36% -- -55% -68% -74% test3 19.6/s 200% 122% -- -28% -42% testA 27.3/s 319% 209% 39% -- -19% testB 33.6/s 414% 279% 71% 23% --

Results are listed from slower to faster regexp(s).

Tests 1 and 3 are regexps from my original question, and the intermediate 2's regexp was supplied by ig. A and B are 1 and 2 respectively, but splitted on alternations (replace in phases).

Possible quick conclusions:

  1. It's better to use backtracking than alternations.
  2. It's better to split a regexp with an alternation whenever is possible.
  3. Itīs better to replace a pattern by the same thing instead of doing some effort to avoid that.

Comments?


Comment on Re: Comparing Regular Expressions
Select or Download Code
Re^2: Comparing Regular Expressions
by JavaFan (Canon) on Sep 01, 2009 at 15:55 UTC
    I wonder why testA consists of three substitutions. Don't the latter two cover all cases already?

      Good point! I just wanted to compare equivalent substitutions.

      Adding testC:

      'testC' => sub { my $f = $test; $f =~ s/\s+,/,/g; $f =~ s/,\s+/,/g; +}, 100_000: Rate test1 test2 test3 testA testC testB test1 6.5/s -- -28% -66% -76% -80% -81% test2 9.0/s 39% -- -53% -67% -73% -73% test3 19.1/s 196% 113% -- -29% -42% -43% testA 27.0/s 318% 201% 41% -- -18% -19% testC 33.1/s 412% 269% 73% 23% -- -1% testB 33.3/s 416% 272% 74% 23% 1% --

      C is 23% faster than A, even running twice over some of the commas in the test string. B doesn't touch a comma twice, but is only 1% faster than C.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://792692]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (16)
As of 2014-04-17 14:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (450 votes), past polls