Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^4: Matching against list of patterns

by Random_Walk (Parson)
on Sep 17, 2004 at 08:40 UTC ( #391710=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Matching against list of patterns
in thread Matching against list of patterns

When you have code like this

# we read a line from somewhere to $line # and @regex is our array of patterns foreach (@regex) { print "Match!\n" if $line=/$_/; }
Each time the regex is run it has a different pattern so has to compile the pattern again. I belive compiling the pattern can be quite costly for complex regexen. However what I have done is first to expand all the patterns into a code block
# { # my @matches; # push @matches 1 if /first_regex/; # push @matches 2 is /second_regex/; # . # . # push @matches n if /nth_regex/; # @matches; # }
This code block is stored in a scalar and eval'd. It runs against the line stored in $_ and returns a list of indices telling us which regex was matched. Now Perl can see that it has a series of invariant regexen and after the first eval caches the compiled regexen and uses this version on subsequent itterations. the problem is still n*m omplexity but we have stopped perl doing rather a large amount of work in each itteration. Certainly for my problem here comparing over 400 lines of log to over 600 reasonably complex regexen the speed up was >10 times.

Cheers,
R.


Comment on Re^4: Matching against list of patterns
Select or Download Code
Re^5: Matching against list of patterns
by Eyck (Priest) on Sep 17, 2004 at 09:40 UTC

    Hmm, well, I still don't understand how unwinding array of regexpes can give 10fold improvement.

    I re-run your code against a bit different body of data (20 regexp, 5k lines), and results are a bit different:

    Array of regexpes unwinded, with /o:
    28.97s user 0.09s system 79% cpu 36.674 total
    Array of regexpes unwinded, without /o:
    29.95s user 0.04s system 95% cpu 31.481 total
    
    foreach loop, without /o:
    2.61s user 0.00s system 100% cpu 2.595 total
    foreach loop, with /o:
    0.33s user 0.01s system 17% cpu 1.957 total
    

      I must say I have been puzzled too..... I have now run a test with an adaption of the short code I posted and got these results (I'll stick my test code at the bottom, its rough as a badgers arse)

      root@tivpre-master:/home/robinp # ./regexps with eval "optimisation" timethis 5000: 56 wallclock secs (52.42 usr + 0.00 sys = 52.42 CPU) naive timethis 5000: 16 wallclock secs (15.63 usr + 0.00 sys = 15.63 CPU)
      however the other results I posted were for real life code where I do get a magnitude better performance using the eval method. Unfortunately I have written the code for a client and can post neither it or the regex I am using (perhaps I can sanitise a couple of example regex). I suspect that the more complex regex I use and the large number make the saving in re-compilation overcome the cost of running eval. For simpler cases and less regex doing an eval probably costs more than the recompiling overhead.

      Cheers,
      R.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://391710]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (9)
As of 2015-07-07 08:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (87 votes), past polls