http://www.perlmonks.org?node_id=256053

shemp has asked for the wisdom of the Perl Monks concerning the following question:

I was playing with the "o" modifier for regexs, and was wondering about this switch in relation to performance. It would seem that if the regex itself doesnt ever change, it is always better to use the "o" modifier. For instance, code like this should work faster over a lot of iterations, if we include the "o" modifier.
my @parts = $field =~ /([A-Za-z]+|\d+)/go;
So, i'm really looking for situations where one would not want to include the "o" modifier (for constant regex's)

thanks

Replies are listed 'Best First'.
Re: regex "o" modifier (NEVER!)
by tye (Sage) on May 07, 2003 at 00:00 UTC

    The best heuristic is:

    Never use /o
    Using /o made some sense in Perl 4. These days, it mostly doesn't provide much performance improvement (regexes are only recompiled when they need to be, so the best you can get from /o is avoiding a string comparison) and it is easy to introduce bugs by using /o.

    If you really need a performance boost and /o actually applies, then you'll get the same performance boost by using qr// instead of /o and you'll have code that makes sense.

                    - tye
Re: regex "o" modifier
by diotalevi (Canon) on May 06, 2003 at 21:34 UTC

    It has no effect in that context. /o only applies when your expression uses interpolation. It causes interpolation to occur only on the first execution instead of every time. You only obfuscate your code by adding it onto invariant expressions.

Re: regex "o" modifier
by pzbagel (Chaplain) on May 06, 2003 at 21:45 UTC
    To add to what diotalevi said, Perl already performs this optimization if your regex does not contain variables which must be interpolated. Perl compiles the regex into an internal format and stashes it to use during the program. When Perl encounters a variable in a regex it can't be sure that the variable will never change and so it compiles it each time the regex is used. If you, as the programmer know that the variables in the regex will never change, you can use the o option to tell perl that it is okay to compile this regex once and reuse it over and over.

    Bye

      When Perl encounters a variable in a regex it can't be sure that the variable will never change and so it compiles it each time the regex is used.
      That used to be true, long time ago, but it is no longer so. Even if your regexp contains variables and you don't use /o, perl may still not compile it again... simply because these days, perl contains a check to see if the scalar has changed since last time the regexp got compiled...

      Benchmarking will show you that, in case the scalar doesn't change, not using /o is comparable in speed, or might even be appear slightly faster (difference < 1%) than when using /o — but likely that's just inaccuracies in the benchmark.

      /o does prevent recompiling the regexp, even if the scalar changes. Likely this is not what you want.

      Therefore, use of /o in modern perl is largely obsolete.

        You should attribute the difference between /constant/o and /constant/ to noise. I've checked the source code, followed the execution path with gdb and verified that there is no C code difference between the two. Consider it like comparing the performance benefits between "\40", " " and "\x32".

      Just a quick question on something here.

      If I had the following:

      my $regex = qr!yes|no!; my $response = "yes"; print "Not maybe" if $response =~ /$regex/o

      Is that a case where the /o would come in to play as that isn't going to change or does the qr compile the regex before the if statement?

      Thanks!

      There is no emoticon for what I'm feeling now.

        nope, you wouldn't benefit from an /o in this case..

        qr quotes and compiles the pattern as a regular expression. To see what it's doing, try this:

        perl -e 'my $re = qr!yes|no!; print "$re\n";' prints: (?-xism:yes|no)

        The result of qr, in this case $re, can be used on it's own, or within another regex.

        my $re = qr!yes|no!; print "yes or no!\n" if "yes" =~ $re; print "yes or no only" if "maybe" !~ $re; print "polite yes or no\n" if "yes thankyou" =~ /$re thankyou/;

        The first 2 uses in this code will not trigger the recompling of any regex. The 3rd will need to be compiled, although I believe you could do...

        my $re = qr!yes|no!; my $re2 = qr!\sthankyou!; print "yes or no!\n" if "yes" =~ $re; print "yes or no only" if "maybe" !~ $re; print "polite yes or no\n" if "yes thankyou" =~ /$re$re2/;

        ..and avoid any recompilation after the initial qr's

        cheers,

        J

        You cannot apply modifiers to a regex object after it has been defined. Rewrite this as qr!yes|no!o. And the answer to your question is that yes the o modifier comes into play with qr// and locks in the regex the first time it is used.

      Well, technically, yes it would come into play since you regex has a variable in it that needs to be interpolated. Adding the o would insure that it is compiled only once. However, since you are only using the regex once, it doesn't matter that much. However, if you were checking that regex 100,000 times, it would make a big difference.

      Cheers

Re: regex "o" modifier
by broquaint (Abbot) on May 07, 2003 at 09:11 UTC
Re: regex "o" modifier
by nite_man (Deacon) on May 07, 2003 at 07:24 UTC
    You can use eval and /o together for rise of performance:
    for my $day_week( qw(Mon Tue Wed Thu Fri Sat Sun) ) { my $regexp = "^$day_week"; eval 'for my $day (qw (Mon Mon Wed Fri Sat Sun Fri Wed Tue Wed) ) +{ if($day =~ m/$regexp/o) { # do somthing } }'; }
    In this case, regexp will be recompiled for each iteration of extern loop, which changes our regexp. But the regexp will not recompile every time into internal loop. It will serve.
          
    --------------------------------
    SV* sv_bless(SV* sv, HV* stash);
    
      That's a perfect example of what qr// was made for.
      for my $day_week( qw(Mon Tue Wed Thu Fri Sat Sun) ) { my $regexp = qr/^$day_week/; for my $day (qw (Mon Mon Wed Fri Sat Sun Fri Wed Tue Wed)) { if($day =~ m/$regexp/) { # do somthing } } }
      Much clearer and more readable and, yes, more efficient too. tye is right: Never use /o.

      Makeshifts last the longest.

Re: regex "o" modifier
by shemp (Deacon) on May 06, 2003 at 21:55 UTC
    Ok, that makes sense that perl automatically deals with this if there is nothing that can change.

    Thanks much