Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Impact of special variables on regex match performance

by roubi (Hermit)
on Dec 09, 2010 at 19:53 UTC ( [id://876308]=perlquestion: print w/replies, xml ) Need Help??

roubi has asked for the wisdom of the Perl Monks concerning the following question:

The following code takes 250x longer on my machine when I uncomment any one line in the 'uncomment_one' subroutine. I am running 5.8.8 on a 64-bit 2.6.9 linux. A test on a different machine with a newer kernel (same perl version) showed a 50x degradation, so did testing with 5.10.1. I have not tried on a newer version.
use strict; use warnings; use Time::HiRes qw(time); sub uncomment_one { #my $test = $`; #my $test = $'; #my $test = $&; } my $in = "abcdefghijklmnopqrstuvwxyz\n" x 20_000; my @a; my $r = '.*'; my $start = time; while ( $in =~ /^($r)$/mg){ push @a, $1; } print "Took " . (time - $start) . " seconds";
I am aware that perlvar says the following about these three special variables:
The use of this variable anywhere in a program imposes a considerable performance penalty on all regular expression matches
Since it is possible to rewrite the above in a way that doesn't suffer from that large of a slowdown I wonder if I have hit something worthy of a bug report though? Or is this expected behavior? (Splitting the input into multiple lines and doing a match on each line, rather than using /mg is much much faster in this instance)

Update: Astute readers will notice that the special characters are being used in a sub that's never called. This isn't a typo. The code above is fully functional and demonstrate the effect I am talking about.

Replies are listed 'Best First'.
Re: Impact of special variables on regex match performance
by JavaFan (Canon) on Dec 09, 2010 at 20:22 UTC
    When you use $`, $' or $&, for each match, Perl copies the pre- and postmatch parts of your match. Considering you have a very large string, and you hardly do anything else in your program, the additional copying dominates the runtime.

    Considering you aren't using $`, $& and $', it seems the obvious thing to do is to keep not using them.

      Oh that explains the difference of behavior between the code posted above and the multi-line approach I describe at the bottom of my post.
      Considering you aren't using $`, $& and $', it seems the obvious thing to do is to keep not using them
      As per the rules of this forum I posted the smallest amount of code reproducing the issue. Not using those special variables (or rather not have them seen by perl ) isn't an option in the real project...
        Not using those special variables (or rather not have them seen by perl ) isn't an option in the real project
        Of course it is.

        It doesn't mean you can do better. If you are using $` and $' on every match you make, it doesn't matter: any alternative will make you pay the price.

        But if you're using $` and friends on some matches, then /p, @- and @+, and adding more captures in your patterns classical alternatives.

        I'm sure it's possible to pose a challenge ("real project") in which those variables would be needed... but it's hard to come up with one. Consider:
        $` The text before matching -- In your code, that's $in $' The text which comes after the match in an input string -- Alt: Lo +okarounds $& The text of the match itself -- Use captures instead

        As mentioned, the cost (= slowdown) is well documented in the standard regex docs; and in many books, tutorials and nodes devoted to regular expressions.

        Concerning your code: given that sub uncomment_one is never explicitly called in what you posted, you may have over-reached in your diligence to follow the guidance ( not exactly "rules ) of this forum. /me suspects that profiling what you show would be informative; certainly, as is, the slowdown using your code is largely caused by the copying which is expensive, as JavaFan points out above.

Re: Impact of special variables on regex match performance
by BrowserUk (Patriarch) on Dec 10, 2010 at 03:48 UTC

    FWIW: I agree with you that the affects of just the presence of these variables, even when buried in some module you are unknowingly loading through n-levels of indirection can be both dramatic and profoundly frustrating. Documentation notwithstanding, the fact that you can be bitten by it despite studiously avoiding them is a pretty bad indictment.

    One tip that I didn't see referenced, is to check your code with Devel::SawAmpersand. The module POD also offers some good advice and workarounds.

    Personally, I think that this should issue a mandatory (compiler) warning that would require specific action to suppress. And that suppression should be overridden by -W to allow for quick checks of heavily nested modules.

    It would certainly be more useful to me than many of the existing warnings. This particularly useless one for instance:

    $x = 123;; $y = '123';; print $x | 'fred';; Argument "fred" isn't numeric in bitwise or (|) at (eval 8) 123 print $y | 'fred';; wrwd

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Impact of special variables on regex match performance
by sundialsvc4 (Abbot) on Dec 09, 2010 at 20:32 UTC

    There are much better ways to do what these variables do.   They were, I think, “an idea that seemed to be a good idea to somebody at some time.”   They aren’t unsupported, but I would consider them effectively deprecated.

      $' and friends aren't deprecated. Deprecation means it's marked as "may disappear from the language".

      Furthermore, $' and friends aren't bad. They are very convenient. They come with a price - there's a performance impact. But we're willing to pay a huge performance impact on picking Perl over C, because Perl is much more convenient. It's the same with $' and friends. I use them a lot. Not for long running programs that possible do thousands of matches on long strings, but I write a lot of programs whose running time is I/O bound, and which do just a handful of matches against short strings. I use $' and $` in those.

      It's just silly to consider $' and friends as evil. As a child, everything is black and white. Things are good, or evil. As a programmer, you do not have that luxery. Only a few constructs or techniques are really evil or very good. Everything else is a tradeoff. Good programmers aren't judged by what they know - but how they do their trade-offs.

Re: Impact of special variables on regex match performance
by markhh (Novice) on Dec 10, 2010 at 19:41 UTC
    It looks like the thing that surprises you is that you are paying the price for the special vars, when you haven't been using them. But it can't tell for sure that you won't have something like:
    my $one = one; eval "uncomment_$one()";
    So it has to play safe. Note that the following code goes wrong, because it can't see the magic var.
    'abc' =~ m/b/; print( eval( '$`') , "\n")
      What surprised me was the magnitude of the performance degradation here, compared to other code doing a similar number of matches on the exact same output (the multi-line approach I briefly mention at the bottom of my post). JavaFan explained why that is the case.
Re: Impact of special variables on regex match performance
by anonymized user 468275 (Curate) on Dec 09, 2010 at 22:19 UTC
    I wonder if there are any loopholes such as putting the code that needs these variables in a required file or perhaps eval

    update: I never find I need these variables in practice. In the unlikely event of there being a real conflict of requirements and all else failing Id use an alternate process for the exceptional need for the variables and pick an IPC solution from perlipc.

    One world, one people

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://876308]
Approved by linuxer
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (5)
As of 2024-04-19 02:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found