http://www.perlmonks.org?node_id=462160


in reply to Pimp My RegEx

I would have said the same thing that dragonchild pointed out, i.e. that since you are matching the beginning of a line, you can apply your regexp line by line.

However, I'll tell you something more that will save you 30 to 50% of execution time:

Remove "use English;" and change $INPUT_RECORD_SEPARATOR into $/. This will avoid the malicious influence of the so called evil variables ($`, $&, $').

I have actually benchmarked your script with a few hundred lines of logs, and this change saves about 40% of execution time on my laptop.

Replies are listed 'Best First'.
Re^2: Pimp My RegEx
by heathen (Acolyte) on May 31, 2005 at 18:56 UTC
    I'll take your advice and pull out the "use english" command. I thought the "use english" syntax would be the kind thing to do for the benefit of my coworkers. They cringe when they see a complex regex and a "$/" will have them running to the camel book! Please elaborate on the "evil variables". I'm not really familiar with them and their implications.

      Sometimes, a comment will work just as well, without the overhead. Consider the following:

      local $/; # INPUT_RECORD_SEPARATOR

      As another option, in the documenation for 'English', there's a suggestion to use:

      use English qw( -no_match_vars ) ;

      to prevent the problems mentioned.

      As for the 'evil variables', the following note is in 'perldoc perlre':

      WARNING: Once Perl sees that you need one of $&, $`, or $' anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program. Perl uses the same mechanism to produce $1, $2, etc, so you also pay a price for each pattern that contains capturing parentheses. (To avoid this cost while retaining the grouping behaviour, use the extended regular expression "(?: ... )" instead.) But if you never use $&, $` or $', then patterns without capturing parentheses will not be penalized. So avoid $&, $', and $` if you can, but if you can't (and some algorithms really appreciate them), once you've used them once, use them at will, because you've already paid the price. As of 5.005, $& is not so costly as the other two.

        Thanks for the advice jhourcle!
        I've got a few scripts out there I'm going to have apply the "-no_match_vars" to. I didn't realize "Use English" came with baggage.

      They cringe when they see a complex regex and a "$/" will have them running to the camel book!

      Add a comment?

      local $/ = $my_input_sep; # Custom input separator (instead of \n)

      Yoda would agree with Perl design: there is no try{}

      Further to the other excellent responses, you can check if your script is infected with any of the evil variables with the Devel::SawAmpersand module.