Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Unefficient Regexp 'Matching this or that'

by moritz (Cardinal)
on May 19, 2010 at 08:35 UTC ( #840637=note: print w/ replies, xml ) Need Help??


in reply to Unefficient Regexp 'Matching this or that'

Try also to run them in reversed order - maybe reading the file is slow the first time, and much faster then second time because the file is then in some cache from the operating system.

Also please compare the number of matches found in both cases - is it comparable?

Perl 6 - links to (nearly) everything that is Perl 6.


Comment on Re: Unefficient Regexp 'Matching this or that'
Re^2: Unefficient Regexp 'Matching this or that'
by pelagic (Curate) on May 19, 2010 at 08:47 UTC
    I actually tried it in reverse order also by calling the tests:
    01_two_string 02_one_string
    And the number of matches is the same, I just stripped this away to make the code smaller.

    pelagic
      Another idea: The regex engine has a heavy optimization for looking for literal substrings, using the same algorithm that index uses.

      In the case of one literal that can be used, but I don't think it's used in alternations.

      You can test this hypothesis by comparing regexes of two and three alternatives - if I'm right, then both should be roughly equal in speed, and much slower than a single, literal string.

        Thanks for your hints!
        I did some more testing now:
        one_string      /00901808/
        two_string      /00901808|87654321/
        four_string     /00901808|87654321|12345678|29586741/
        2_grep_string   programmed loop over list of 2 regexps
        4_grep_string   programmed loop over list of 4 regexps
        
        > perl bench_regexp 100000lines.92MB.file Benchmark: timing 1 iterations of 2_grep_string, 4_grep_string, four_s +tring, one_string, two_string... Matched records: 1 2_grep_string: 3 wallclock secs ( 2.91 usr + 0.44 sys = 3.35 CPU) @ + 0.30/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 4_grep_string: 4 wallclock secs ( 3.56 usr + 0.42 sys = 3.98 CPU) @ + 0.25/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 four_string: 100 wallclock secs (98.83 usr + 0.56 sys = 99.39 CPU) @ + 0.01/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 one_string: 2 wallclock secs ( 1.62 usr + 0.40 sys = 2.02 CPU) @ 0 +.50/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 two_string: 75 wallclock secs (73.87 usr + 0.53 sys = 74.40 CPU) @ 0 +.01/s (n=1) (warning: too few iterations for a reliable count)

        pelagic

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://840637]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (16)
As of 2014-09-02 12:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (22 votes), past polls