Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Is regcomp slower in 5.16.2 than in 5.8.8. How to speed things up ?

by PerlBhikkhuni (Initiate)
on May 29, 2013 at 09:47 UTC ( #1035800=perlquestion: print w/ replies, xml ) Need Help??
PerlBhikkhuni has asked for the wisdom of the Perl Monks concerning the following question:

Running the following script on these two versions of perl (5.8.8 and 5.16.2) shows that 5.16.2 is slower than 5.8.8 with regex-operations. Why is that so ? And, is there a way i can speed things up ?

use Time::HiRes 'time'; for my $regex ( q{^a$|^b$}, q{^(a|b)$}, q{(a|b)}, q{^a$|^b$|^c$|^d$|^e$|^f$}, q{^(a|b|c|d|e|f)$}, q{a|b|c|d|e|f}, ) { my $start = time(); for my $i (1 .. 100_000) { 'SOMEBIGSTRINGHERE' =~ m{$regex}; } my $runtime = time() - $start; printf("%50s: %f\n", $regex, $runtime); } with perl 5.8.8 - ^a$|^b$: 0.101017 ^(a|b)$: 0.017527 (a|b): 0.107669 ^a$|^b$|^c$|^d$|^e$|^f$: 0.163687 ^(a|b|c|d|e|f)$: 0.022244 a|b|c|d|e|f: 0.171675 with perl 5.16.2 - ^a$|^b$: 0.254984 ^(a|b)$: 0.031507 (a|b): 0.045713 ^a$|^b$|^c$|^d$|^e$|^f$: 0.443303 ^(a|b|c|d|e|f)$: 0.031506 a|b|c|d|e|f: 0.043478

Also, how do we know that a regex is precompiled ? From http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators,

$rex = qr/my.STRING/is; print $rex; # prints (?si-xm:my.STRING) s/$rex/foo/;

this prints (?si-xm:my.STRING) in 5.8.8 and ((?^si:my.STRING)) in 5.16.2

Comment on Is regcomp slower in 5.16.2 than in 5.8.8. How to speed things up ?
Select or Download Code
Re: Is regcomp slower in 5.16.2 than in 5.8.8. How to speed things up ?
by vsespb (Hermit) on May 29, 2013 at 10:27 UTC
    Just a small observation. Seems under perl 5.8 regexps are printer in different way, that 5.10.
    I am observing in some of tests for my code the following (only in 5.8.x):
    # Compared m/$data->[1]{"re"}/ # got : (?-xism:(^|/)dir\/) # expect : (?-xism:(^|/)dir/)

      Is there a way to find out when perl precompiles a regex and when it does not ?

Re: Is regcomp slower in 5.16.2 than in 5.8.8. How to speed things up ?
by choroba (Abbot) on May 29, 2013 at 10:43 UTC
    5.16.2 is slower than 5.8.8 with regex-operations
    You are missing the word some: 0.171675 > 0.043478 and also 0.107669 > 0.045713.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Is regcomp slower in 5.16.2 than in 5.8.8. How to speed things up ?
by dave_the_m (Parson) on May 29, 2013 at 12:51 UTC
    5.10.0 introduced the TRIE mechanism which allows alternations ('|') to be matched more efficiently (especially with many alternatives), at the cost of greater set-up time at the start of the alternation.

    The examples you've shown have tended to be simple alternations that emphasise start-up time rather than fail-and-try-another-alternative time, which is why most (but not) are faster in 5.8.x.

    Also, micro-benchmarks like these tend to be very sensitive to particular optimisations: change the pattern slightly, and you get very different results. Sometimes perl can tell a pattern will fail even without running the alternation. Etc.

    Having said that, I do wonder whether the TRIE compilation code should skip creating a trie when the alternation is simple with few branches.

    (A quick technical overview for those interested: in something like (to|be|or|not|to|be), 5.8.x would try to match each word in turn, which is slow when there is a big list of words. a TRIE on the other hand, pre-computes a tree, so it knows the first letter must match one of b,n,o,t, and if the first letter matches t, the second must be o, etc. So the whole alternation is matched in a single pass, a character at a time, rather than going back and trying each word in turn.)

    Dave.

      interesting..

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1035800]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (12)
As of 2014-12-22 09:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (113 votes), past polls