Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Comparing Regular Expressions

by vitoco (Friar)
on Sep 01, 2009 at 15:44 UTC ( #792692=note: print w/ replies, xml ) Need Help??


in reply to Comparing Regular Expressions

Thanks to everyone.

I did my own benchmark based on your suggestions, and found the following:

#!perl -w use strict; use warnings; use Benchmark 'cmpthese'; my $test = join '', map { ' ' x (0 == int rand 9) . ',' . ' ' x ((0 = += int rand 3) * int rand 3) . $_ } 0 .. 1_000; cmpthese(-10,{ 'test1' => sub { my $f = $test; $f =~ s/\s+,\s+|\s+,|,\s+/,/g; }, 'testA' => sub { my $f = $test; $f =~ s/\s+,\s+/,/g; $f =~ s/\s+,/,/ +g; $f =~ s/,\s+/,/g; }, 'test2' => sub { my $f = $test; $f =~ s/\s+,\s*|,\s+/,/g; }, 'testB' => sub { my $f = $test; $f =~ s/\s+,\s*/,/g; $f =~ s/,\s+/,/ +g; }, 'test3' => sub { my $f = $test; $f =~ s/\s*,\s*/,/g; }, });

Results using ActivePerl 5.10.0 on WinXP-SP3:

1_000: Rate test1 test2 test3 testA testB test1 974/s -- -22% -56% -66% -71% test2 1256/s 29% -- -43% -56% -63% test3 2221/s 128% 77% -- -22% -35% testA 2852/s 193% 127% 28% -- -16% testB 3412/s 250% 172% 54% 20% -- 10_000: Rate test1 test2 test3 testA testB test1 79/s -- -24% -62% -72% -77% test2 103/s 31% -- -50% -63% -70% test3 205/s 161% 99% -- -26% -39% testA 277/s 252% 169% 35% -- -18% testB 339/s 331% 228% 65% 22% -- 100_000: Rate test1 test2 test3 testA testB test1 6.5/s -- -26% -67% -76% -81% test2 8.9/s 36% -- -55% -68% -74% test3 19.6/s 200% 122% -- -28% -42% testA 27.3/s 319% 209% 39% -- -19% testB 33.6/s 414% 279% 71% 23% --

Results are listed from slower to faster regexp(s).

Tests 1 and 3 are regexps from my original question, and the intermediate 2's regexp was supplied by ig. A and B are 1 and 2 respectively, but splitted on alternations (replace in phases).

Possible quick conclusions:

  1. It's better to use backtracking than alternations.
  2. It's better to split a regexp with an alternation whenever is possible.
  3. Itīs better to replace a pattern by the same thing instead of doing some effort to avoid that.

Comments?


Comment on Re: Comparing Regular Expressions
Select or Download Code
Re^2: Comparing Regular Expressions
by JavaFan (Canon) on Sep 01, 2009 at 15:55 UTC
    I wonder why testA consists of three substitutions. Don't the latter two cover all cases already?

      Good point! I just wanted to compare equivalent substitutions.

      Adding testC:

      'testC' => sub { my $f = $test; $f =~ s/\s+,/,/g; $f =~ s/,\s+/,/g; +}, 100_000: Rate test1 test2 test3 testA testC testB test1 6.5/s -- -28% -66% -76% -80% -81% test2 9.0/s 39% -- -53% -67% -73% -73% test3 19.1/s 196% 113% -- -29% -42% -43% testA 27.0/s 318% 201% 41% -- -18% -19% testC 33.1/s 412% 269% 73% 23% -- -1% testB 33.3/s 416% 272% 74% 23% 1% --

      C is 23% faster than A, even running twice over some of the commas in the test string. B doesn't touch a comma twice, but is only 1% faster than C.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://792692]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (9)
As of 2015-07-07 06:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (87 votes), past polls