Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Is regcomp slower in 5.16.2 than in 5.8.8. How to speed things up ?

by PerlBhikkhuni (Initiate)
on May 29, 2013 at 09:47 UTC ( #1035800=perlquestion: print w/ replies, xml ) Need Help??
PerlBhikkhuni has asked for the wisdom of the Perl Monks concerning the following question:

Running the following script on these two versions of perl (5.8.8 and 5.16.2) shows that 5.16.2 is slower than 5.8.8 with regex-operations. Why is that so ? And, is there a way i can speed things up ?

use Time::HiRes 'time'; for my $regex ( q{^a$|^b$}, q{^(a|b)$}, q{(a|b)}, q{^a$|^b$|^c$|^d$|^e$|^f$}, q{^(a|b|c|d|e|f)$}, q{a|b|c|d|e|f}, ) { my $start = time(); for my $i (1 .. 100_000) { 'SOMEBIGSTRINGHERE' =~ m{$regex}; } my $runtime = time() - $start; printf("%50s: %f\n", $regex, $runtime); } with perl 5.8.8 - ^a$|^b$: 0.101017 ^(a|b)$: 0.017527 (a|b): 0.107669 ^a$|^b$|^c$|^d$|^e$|^f$: 0.163687 ^(a|b|c|d|e|f)$: 0.022244 a|b|c|d|e|f: 0.171675 with perl 5.16.2 - ^a$|^b$: 0.254984 ^(a|b)$: 0.031507 (a|b): 0.045713 ^a$|^b$|^c$|^d$|^e$|^f$: 0.443303 ^(a|b|c|d|e|f)$: 0.031506 a|b|c|d|e|f: 0.043478

Also, how do we know that a regex is precompiled ? From http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators,

$rex = qr/my.STRING/is; print $rex; # prints (?si-xm:my.STRING) s/$rex/foo/;

this prints (?si-xm:my.STRING) in 5.8.8 and ((?^si:my.STRING)) in 5.16.2

Comment on Is regcomp slower in 5.16.2 than in 5.8.8. How to speed things up ?
Select or Download Code
Re: Is regcomp slower in 5.16.2 than in 5.8.8. How to speed things up ?
by vsespb (Hermit) on May 29, 2013 at 10:27 UTC
    Just a small observation. Seems under perl 5.8 regexps are printer in different way, that 5.10.
    I am observing in some of tests for my code the following (only in 5.8.x):
    # Compared m/$data->[1]{"re"}/ # got : (?-xism:(^|/)dir\/) # expect : (?-xism:(^|/)dir/)

      Is there a way to find out when perl precompiles a regex and when it does not ?

Re: Is regcomp slower in 5.16.2 than in 5.8.8. How to speed things up ?
by choroba (Abbot) on May 29, 2013 at 10:43 UTC
    5.16.2 is slower than 5.8.8 with regex-operations
    You are missing the word some: 0.171675 > 0.043478 and also 0.107669 > 0.045713.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Is regcomp slower in 5.16.2 than in 5.8.8. How to speed things up ?
by dave_the_m (Parson) on May 29, 2013 at 12:51 UTC
    5.10.0 introduced the TRIE mechanism which allows alternations ('|') to be matched more efficiently (especially with many alternatives), at the cost of greater set-up time at the start of the alternation.

    The examples you've shown have tended to be simple alternations that emphasise start-up time rather than fail-and-try-another-alternative time, which is why most (but not) are faster in 5.8.x.

    Also, micro-benchmarks like these tend to be very sensitive to particular optimisations: change the pattern slightly, and you get very different results. Sometimes perl can tell a pattern will fail even without running the alternation. Etc.

    Having said that, I do wonder whether the TRIE compilation code should skip creating a trie when the alternation is simple with few branches.

    (A quick technical overview for those interested: in something like (to|be|or|not|to|be), 5.8.x would try to match each word in turn, which is slow when there is a big list of words. a TRIE on the other hand, pre-computes a tree, so it knows the first letter must match one of b,n,o,t, and if the first letter matches t, the second must be o, etc. So the whole alternation is matched in a single pass, a character at a time, rather than going back and trying each word in turn.)

    Dave.

      interesting..

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1035800]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (11)
As of 2014-09-01 13:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (13 votes), past polls