Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: Regexp optimization - /o option better than precompiled regexp? (analysis)

by tye (Sage)
on Jun 28, 2010 at 19:31 UTC ( #846966=note: print w/replies, xml ) Need Help??

in reply to Regexp optimization - /o option better than precompiled regexp?

Wow. Those results were very hard to read and understand.

First, none of your cases seem to be recompiling a regex each time through the loop. (It appears to me that) the worst case you've included does a string compare to determine that the regex doesn't need to be recompiled (and does this each time through the loop). Clearly, CORE::regcomp() doesn't unconditionally recompile a regex (based on parsing your results, it checks some things to determine if it even needs to do a string compare, then optionally does a string compare, and then only recompiles the regex if the string compare finds a difference).

Let's look at your results cleaned up so the interesting numbers are much easier to compare:

Readonly my $REGEXP_READONLY => '999986'; if( $l_line =~ m/$REGEXP_READONLY/ ) { # 5.78s - CORE:regcomp # 0.91s - CORE:match # 1.83s - Readonly::Scalar::FETCH use constant REGEXP_CONSTANT => '999986'; if( $l_line =~ m/${\REGEXP_CONSTANT}/ ) { # 0.74s - CORE:regcomp # 0.75s - CORE:match if( $l_line =~ m/999986/ ) { # 0.84s - CORE:match if( $l_line =~ m/$REGEXP_READONLY/o ) { # 0.73s - CORE:regcomp # 0.75s - CORE:match my $l_search_r = qr/$REGEXP_READONLY/; if( $l_line =~ $l_search_r ) { # 1.33s - CORE:regcomp # 0.78s - CORE:match my $l_search = $REGEXP_READONLY; if( $l_line =~ m/$l_search/ ) { # 0.74s - CORE:regcomp # 0.76s - CORE:match $l_search = $REGEXP_READONLY; if( $l_line =~ m/$l_search/o ) { # 0.69s - CORE:regcomp # 0.76s - CORE:match

Second, let's take care of the least interesting bit:

# 0.91s - CORE:match if( $l_line =~ m/$REGEXP_READONLY/ ) { # 0.75s - CORE:match if( $l_line =~ m/${\REGEXP_CONSTANT}/ ) { # 0.84s - CORE:match if( $l_line =~ m/999986/ ) { # 0.75s - CORE:match if( $l_line =~ m/$REGEXP_READONLY/o ) { # +/o # 0.78s - CORE:match if( $l_line =~ $l_search_r ) { # +qr// # 0.76s - CORE:match if( $l_line =~ m/$l_search/ ) { # 0.76s - CORE:match if( $l_line =~ m/$l_search/o ) { # +/o

We can see that the difference in speed of the regex matching is "in the noise". Indeed, I can think of no reason why the speeds would be any different in practice and suspect that the differences reported actually are indeed just noise. You might want to move the order of the cases around and re-run and see how the noise moves with the order of execution and/or just moves randomly. There might be an insignificant difference that isn't noise in one of those cases, but I won't waste time chasing that until I see better evidence of this insignificant difference in speed not being noise.

Now for the more interesting part:

# 5.78s - CORE:regcomp if( $l_line =~ m/$REGEXP_READONLY/ ) { # 0.74s - CORE:regcomp if( $l_line =~ m/${\REGEXP_CONSTANT}/ ) { # 0.00s if( $l_line =~ m/999986/ ) { # 0.73s - CORE:regcomp if( $l_line =~ m/$REGEXP_READONLY/o ) { # +/o # 1.33s - CORE:regcomp if( $l_line =~ $l_search_r ) { # +qr// # 0.74s - CORE:regcomp if( $l_line =~ m/$l_search/ ) { # 0.69s - CORE:regcomp if( $l_line =~ m/$l_search/o ) { # +/o

We see that the first case takes about 8x longer when calling regcomp() compared to most of the others. My theory is that, since magic is involved and each time through the loop re-calls FETCH(), that a fresh copy of the read-only value is getting handed to regcomp() and so it is forced to do the string comparison. It looks to me like none of the other cases even need to compare strings.

This means that the differences between most the other cases are so very, very tiny as to be extremely unlikely to be noticed in any real-world situation. They are differences between relatively short paths through some C code. In a Perl script, such minuscule run-times will be completely dwarfed by rather mundane stuff and so won't end up adding up to anything more than a tiny fraction of a real script's over-all run time.

The m/999986/ is moderately interesting in that it demonstrates that the regex is actually compiled when the Perl code is compiled and Perl can completely avoid checking whether it needs to compile it again.

The other cases show only differences that are, again, "in the noise".

So there is no appreciable speed advantage to using /o. There are, however, significant disadvantages with regard to clarity of code and likelihood of introducing bugs.

It is unfortunate that you have shown that the use of qr// can approximately double the time taken in regcomp(). Of course, this time still adds up to a very tiny amount that is very unlikely to add up to anything that would be noticed in a real-world situation.

Let's look at the source code (p5git://pp_ctl.c.) and see why. Search for the pp_regcomp function. And there we see the extra work that is required in the case of qr// including a link to why this extra work is unfortunate and will likely go away at some point in the future:

But, again, the slight speed penalty is very unlikely to be noticed outside of a benchmark and the benefit to code clarity and maintainability (of using qr//) makes this a very easy call for me to make for myself. I use qr//. I never use /o.

(Updated first two sentences of 2nd paragraph to not make my theory sound like something I have verified completely.)

- tye        

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://846966]
and dust plays in a shaft of sunlight...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2018-03-25 02:11 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (299 votes). Check out past polls.