Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Wow. Those results were very hard to read and understand.

First, none of your cases seem to be recompiling a regex each time through the loop. (It appears to me that) the worst case you've included does a string compare to determine that the regex doesn't need to be recompiled (and does this each time through the loop). Clearly, CORE::regcomp() doesn't unconditionally recompile a regex (based on parsing your results, it checks some things to determine if it even needs to do a string compare, then optionally does a string compare, and then only recompiles the regex if the string compare finds a difference).

Let's look at your results cleaned up so the interesting numbers are much easier to compare:

Readonly my $REGEXP_READONLY => '999986'; if( $l_line =~ m/$REGEXP_READONLY/ ) { # 5.78s - CORE:regcomp # 0.91s - CORE:match # 1.83s - Readonly::Scalar::FETCH use constant REGEXP_CONSTANT => '999986'; if( $l_line =~ m/${\REGEXP_CONSTANT}/ ) { # 0.74s - CORE:regcomp # 0.75s - CORE:match if( $l_line =~ m/999986/ ) { # 0.84s - CORE:match if( $l_line =~ m/$REGEXP_READONLY/o ) { # 0.73s - CORE:regcomp # 0.75s - CORE:match my $l_search_r = qr/$REGEXP_READONLY/; if( $l_line =~ $l_search_r ) { # 1.33s - CORE:regcomp # 0.78s - CORE:match my $l_search = $REGEXP_READONLY; if( $l_line =~ m/$l_search/ ) { # 0.74s - CORE:regcomp # 0.76s - CORE:match $l_search = $REGEXP_READONLY; if( $l_line =~ m/$l_search/o ) { # 0.69s - CORE:regcomp # 0.76s - CORE:match

Second, let's take care of the least interesting bit:

# 0.91s - CORE:match if( $l_line =~ m/$REGEXP_READONLY/ ) { # 0.75s - CORE:match if( $l_line =~ m/${\REGEXP_CONSTANT}/ ) { # 0.84s - CORE:match if( $l_line =~ m/999986/ ) { # 0.75s - CORE:match if( $l_line =~ m/$REGEXP_READONLY/o ) { # +/o # 0.78s - CORE:match if( $l_line =~ $l_search_r ) { # +qr// # 0.76s - CORE:match if( $l_line =~ m/$l_search/ ) { # 0.76s - CORE:match if( $l_line =~ m/$l_search/o ) { # +/o

We can see that the difference in speed of the regex matching is "in the noise". Indeed, I can think of no reason why the speeds would be any different in practice and suspect that the differences reported actually are indeed just noise. You might want to move the order of the cases around and re-run and see how the noise moves with the order of execution and/or just moves randomly. There might be an insignificant difference that isn't noise in one of those cases, but I won't waste time chasing that until I see better evidence of this insignificant difference in speed not being noise.

Now for the more interesting part:

# 5.78s - CORE:regcomp if( $l_line =~ m/$REGEXP_READONLY/ ) { # 0.74s - CORE:regcomp if( $l_line =~ m/${\REGEXP_CONSTANT}/ ) { # 0.00s if( $l_line =~ m/999986/ ) { # 0.73s - CORE:regcomp if( $l_line =~ m/$REGEXP_READONLY/o ) { # +/o # 1.33s - CORE:regcomp if( $l_line =~ $l_search_r ) { # +qr// # 0.74s - CORE:regcomp if( $l_line =~ m/$l_search/ ) { # 0.69s - CORE:regcomp if( $l_line =~ m/$l_search/o ) { # +/o

We see that the first case takes about 8x longer when calling regcomp() compared to most of the others. My theory is that, since magic is involved and each time through the loop re-calls FETCH(), that a fresh copy of the read-only value is getting handed to regcomp() and so it is forced to do the string comparison. It looks to me like none of the other cases even need to compare strings.

This means that the differences between most the other cases are so very, very tiny as to be extremely unlikely to be noticed in any real-world situation. They are differences between relatively short paths through some C code. In a Perl script, such minuscule run-times will be completely dwarfed by rather mundane stuff and so won't end up adding up to anything more than a tiny fraction of a real script's over-all run time.

The m/999986/ is moderately interesting in that it demonstrates that the regex is actually compiled when the Perl code is compiled and Perl can completely avoid checking whether it needs to compile it again.

The other cases show only differences that are, again, "in the noise".

So there is no appreciable speed advantage to using /o. There are, however, significant disadvantages with regard to clarity of code and likelihood of introducing bugs.

It is unfortunate that you have shown that the use of qr// can approximately double the time taken in regcomp(). Of course, this time still adds up to a very tiny amount that is very unlikely to add up to anything that would be noticed in a real-world situation.

Let's look at the source code (p5git://pp_ctl.c.) and see why. Search for the pp_regcomp function. And there we see the extra work that is required in the case of qr// including a link to why this extra work is unfortunate and will likely go away at some point in the future: http://www.nntp.perl.org/group/perl.perl5.porters/2007/03/msg122415.html.

But, again, the slight speed penalty is very unlikely to be noticed outside of a benchmark and the benefit to code clarity and maintainability (of using qr//) makes this a very easy call for me to make for myself. I use qr//. I never use /o.

(Updated first two sentences of 2nd paragraph to not make my theory sound like something I have verified completely.)

- tye        


In reply to Re: Regexp optimization - /o option better than precompiled regexp? (analysis) by tye
in thread Regexp optimization - /o option better than precompiled regexp? by Hessu

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-03-28 16:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found