Re: Impact of special variables on regex match performance
by JavaFan (Canon) on Dec 09, 2010 at 20:22 UTC
|
When you use $`, $' or $&, for each match, Perl copies the pre- and postmatch parts of your match. Considering you have a very large string, and you hardly do anything else in your program, the additional copying dominates the runtime.
Considering you aren't using $`, $& and $', it seems the obvious thing to do is to keep not using them. | [reply] |
|
Oh that explains the difference of behavior between the code posted above and the multi-line approach I describe at the bottom of my post.
Considering you aren't using $`, $& and $', it seems the obvious thing to do is to keep not using them
As per the rules of this forum I posted the smallest amount of code reproducing the issue. Not using those special variables (or rather not have them seen by perl ) isn't an option in the real project...
| [reply] |
|
Not using those special variables (or rather not have them seen by perl ) isn't an option in the real project
Of course it is.
It doesn't mean you can do better. If you are using $` and $' on every match you make, it doesn't matter: any alternative will make you pay the price.
But if you're using $` and friends on some matches, then /p, @- and @+, and adding more captures in your patterns classical alternatives.
| [reply] [d/l] [select] |
|
|
|
I'm sure it's possible to pose a challenge ("real project") in which those variables would be needed... but it's hard to come up with one. Consider:
$` The text before matching -- In your code, that's $in
$' The text which comes after the match in an input string -- Alt: Lo
+okarounds
$& The text of the match itself -- Use captures instead
As mentioned, the cost (= slowdown) is well documented in the standard regex docs; and in many books, tutorials and nodes devoted to regular expressions.
Concerning your code: given that sub uncomment_one is never explicitly called in what you posted, you may have over-reached in your diligence to follow the guidance ( not exactly "rules ) of this forum. /me suspects that profiling what you show would be informative; certainly, as is, the slowdown using your code is largely caused by the copying which is expensive, as JavaFan points out above.
| [reply] [d/l] [select] |
|
|
|
Re: Impact of special variables on regex match performance
by BrowserUk (Patriarch) on Dec 10, 2010 at 03:48 UTC
|
FWIW: I agree with you that the affects of just the presence of these variables, even when buried in some module you are unknowingly loading through n-levels of indirection can be both dramatic and profoundly frustrating. Documentation notwithstanding, the fact that you can be bitten by it despite studiously avoiding them is a pretty bad indictment.
One tip that I didn't see referenced, is to check your code with Devel::SawAmpersand. The module POD also offers some good advice and workarounds.
Personally, I think that this should issue a mandatory (compiler) warning that would require specific action to suppress. And that suppression should be overridden by -W to allow for quick checks of heavily nested modules.
It would certainly be more useful to me than many of the existing warnings. This particularly useless one for instance:
$x = 123;;
$y = '123';;
print $x | 'fred';;
Argument "fred" isn't numeric in bitwise or (|) at (eval 8)
123
print $y | 'fred';;
wrwd
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Re: Impact of special variables on regex match performance
by sundialsvc4 (Abbot) on Dec 09, 2010 at 20:32 UTC
|
There are much better ways to do what these variables do. They were, I think, “an idea that seemed to be a good idea to somebody at some time.” They aren’t unsupported, but I would consider them effectively deprecated.
| [reply] |
|
$' and friends aren't deprecated. Deprecation means it's marked as "may disappear from the language".
Furthermore, $' and friends aren't bad. They are very convenient. They come with a price - there's a performance impact. But we're willing to pay a huge performance impact on picking Perl over C, because Perl is much more convenient. It's the same with $' and friends. I use them a lot. Not for long running programs that possible do thousands of matches on long strings, but I write a lot of programs whose running time is I/O bound, and which do just a handful of matches against short strings. I use $' and $` in those.
It's just silly to consider $' and friends as evil. As a child, everything is black and white. Things are good, or evil. As a programmer, you do not have that luxery. Only a few constructs or techniques are really evil or very good. Everything else is a tradeoff. Good programmers aren't judged by what they know - but how they do their trade-offs.
| [reply] |
Re: Impact of special variables on regex match performance
by markhh (Novice) on Dec 10, 2010 at 19:41 UTC
|
It looks like the thing that surprises you is that you are paying the price for the special vars, when you haven't been using them. But it can't tell for sure that you won't have something like:
my $one = one;
eval "uncomment_$one()";
So it has to play safe.
Note that the following code goes wrong, because it can't see the magic var.
'abc' =~ m/b/;
print( eval( '$`') , "\n")
| [reply] [d/l] [select] |
|
What surprised me was the magnitude of the performance degradation here, compared to other code doing a similar number of matches on the exact same output (the multi-line approach I briefly mention at the bottom of my post). JavaFan explained why that is the case.
| [reply] |
Re: Impact of special variables on regex match performance
by anonymized user 468275 (Curate) on Dec 09, 2010 at 22:19 UTC
|
I wonder if there are any loopholes such as putting the code that needs these variables in a required file or perhaps eval
update: I never find I need these variables in practice. In the unlikely event of there being a real conflict of requirements and all else failing Id use an alternate process for the exceptional need for the variables and pick an IPC solution from perlipc.
| [reply] |