Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Perl: the Markov chain saw
 
PerlMonks

Naughty match variables in CPAN?

by tall_man (Parson)
 | Log in | Create a new user | The Monastery Gates | Super Search | 
 | Seekers of Perl Wisdom | Meditations | PerlMonks Discussion | 
 | Obfuscation | Reviews | Cool Uses For Perl | Perl News | Q&A | Tutorials | 
 | Poetry | Recent Threads | Newest Nodes | Donate | What's New | 

on Jul 22, 2003 at 00:15 UTC ( #276549=perlquestion: print w/ replies, xml ) Need Help??
tall_man has asked for the wisdom of the Perl Monks concerning the following question:

A co-worker recently added some modules to a large perl program that used $&, $' and $` (a.k.a. the "naughty match variables"). I know these add a large performance penalty for all regular expressions in the program, so I removed all the uses. Then I tried all three of the methods in Mastering Regular Expressions, second edition, p. 358, "How to Check Whether Your Code is Tainted by $&".

Only the last method on that page works for perl 5.8.0. The "-Mre=debug" does not show either 'Enabling $`, $&, $' support' or 'Omitting $`, $&, $' support' any more.

The Devel::SawAmpersand doesn't work either. It gives false positives on trivial programs that don't have the "naughty match variables".

Here is the subroutine that actually worked:

use strict; use Time::HiRes; sub CheckNaughtiness () { my $text = 'x' x 10_000; my $start = Time::HiRes::time(); for (my $i = 0; $i < 5_000; $i++) { } my $overhead = Time::HiRes::time() - $start; $start = Time::HiRes::time(); for (my $i = 0; $i < 5_000; $i++) { $text =~ m/^/} my $delta = Time::HiRes::time() - $start; printf "It seems your code is %s (overhead=%.2f, delta=%.2f)\n", ($delta > $overhead*5) ? "naughty" : "clean", $overhead, $delta; }
To my great surprise, when I traced it out I found two CPAN modules (so far) that we use are also tainted in this way: Printer and Math::MatrixReal. I have sent mail to the maintainers of these modules pointing out the issue.

It makes me wonder how many other CPAN modules are tainted with "naughty match variables". Another way to get tainted is to do:

# Don't do this: use English; # Do this instead: use English qw( -no_match_vars );
Has anyone else noticed this problem? Should there be a general check for "naughty match variables" for code submitted to CPAN?

Comment on Naughty match variables in CPAN?
Select or Download Code
Re: Naughty match variables in CPAN?
by Dog and Pony (Priest) on Jul 22, 2003 at 02:30 UTC
    Curious question: Exactly how big impact does those variables actually have? I never use them, since it's been hammered into me that I shouldn't because of performance issues. This makes sense, and one rarely needs them anyways. But I'm just a bit curious on what size performance hit are we talking about here? Microseconds, seconds, minutes?

    Also, perlre says: once you've used them once, use them at will, because you've already paid the price. If I read that right, it means that the performance hit is only triggered once, the first time one uses them.

    I could make a point here about coding for simplicity instead of (unnecessary) performance, but mainly, I am just curious. Is this advice something that is given as a knee-jerk response, and because we all want to have great performance, or is the impact really so large that it matters in usual cases?

    All that aside, I do agree that it is a bad idea to use them in any module that might be used by someone else - there is no telling what the performance considerations might be for that script. If you are using English, performance is probably not what you are looking for. But there are probably other examples.


    You have moved into a dark place.
    It is pitch black. You are likely to be eaten by a grue.
      If I read that right, it means that the performance hit is only triggered once, the first time one uses them.

      No. What that means is that once you use them, all matches will incur the overhead of using them whether or not you actually do. It's not a one-time hit but it is all-or-nothing.

      (This is my 1000th post! :-)

      -sauoq
      "My two cents aren't worth a dime.";
      
        Gotcha! I guess it is too late over here to read documentation. :)

        But I still wonder how much of a penalty there is.


        You have moved into a dark place.
        It is pitch black. You are likely to be eaten by a grue.
      How big is the impact? Better than 10x in a simple test (5.8.0 on Win2K).

      Just for fun, here's the benchmark. It took some guesswork to get it to run the subs in the right order - clean first, then use English;, then naughty. If anyone uncomments the print statements to test the order, use 1 as an argument so you don't have to wait forever. Here are the results:

      use strict; use Benchmark qw/cmpthese/; my $time = shift || -5; my $text = 'x' x 10_000; sub clean { # print "clean"; $text =~ m/^x/; } sub make_dirty { # print "md"; eval "use English;"; } sub naughty { # print "naughty"; $text =~ m/^x/; } my %hash = ( clean => 'clean', naughtify => 'make_dirty', sawamp => 'naughty', ); cmpthese ( $time, { clean => 'clean', naughtify => 'make_dirty', sawamp => 'naughty', }); __END__ results: C:\s\pldir>naughty.pl -5 Rate naughtify sawamp clean naughtify 433/s -- -98% -100% sawamp 24153/s 5481% -- -92% clean 300603/s 69366% 1145% --

      Someone with more benchmark-fu may correct me on this, but it looks right to me.

Re: Naughty match variables in CPAN?
by waswas-fng (Curate) on Jul 22, 2003 at 04:34 UTC
    What does this buy you over a quick recusive egrep of the modules you are using? For instance Math::MatrixReal returns a line such as:
    $string = $';

    You know if there is a match for one of those vars you will see the problem.

    -Waswas

      You can't do that because the code $money =  '$'.$money if ($currency eq 'USD'); also matches. (Only perl can parse Perl.)


      Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Re: Naughty match variables in CPAN?
by TomDLux (Priest) on Jul 22, 2003 at 07:46 UTC

    They no longer invoke the horrible penalty they used to, but in any case I prefer using parentheses to isolate subexpressions I want to remember.

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Re: Naughty match variables in CPAN?
by zakzebrowski (Curate) on Jul 22, 2003 at 12:20 UTC
    Should there be a general check for "naughty match variables" for code submitted to CPAN?
    Maybee. But, there may be cases were the author may choose to make life simpler by using those variables. A simple example is the following code which extracts relative context from a full text match.
    while (<DATA>){ while ($_ =~/(^|\W)CPAN(\W)/gi){ print substr($`,length($`)-10) . $& . substr($',0,10) . "\n"; } } __DATA__ To my great surprise, when I traced it out I found two CPAN modules (s +o far) that we use are also tainted in this way: Printer and Math::Ma +trixReal. I have sent mail to the maintainers of these modules point +ing out the issue. It makes me wonder how many other CPAN modules ar +e tainted with "naughty match variables".

    Output:
    ZAZ@localhost ~
    $ perl sample.pl
     found two CPAN modules
    
    many other CPAN modules ar
    
    Standard untested code caveat...

    ----
    Zak
    Pluralitas non est ponenda sine neccesitate - mysql's philosphy

Login:
Password
remember me
What's my password?
Create A New User

Node Status?
node history
Node Type: perlquestion [id://276549]
Front-paged by diotalevi
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (26)
BrowserUk
Corion
GrandFather
Your Mother
jmcnamara
toolic
holli
atcroft
eric256
grantm
kennethk
scorpio17
MidLifeXis
thezip
Eyck
pileofrogs
clinton
Xaositect
ssandv
hbm
rubasov
flamey
MikeDexter
smile4me
FunkyShu
ienh
As of 2010-02-09 20:33 GMT
Sections?
The Monastery Gates
Seekers of Perl Wisdom
Meditations
PerlMonks Discussion
Categorized Q&A
Tutorials
Obfuscated Code
Perl Poetry
Cool Uses for Perl
Perl News
Information?
PerlMonks FAQ
Guide to the Monastery
What's New at PerlMonks
Voting/Experience System
Tutorials
Reviews
Library
Perl FAQs
Other Info Sources
Find Nodes?
Nodes You Wrote
Super Search
List Nodes By Users
Newest Nodes
Recently Active Threads
Selected Best Nodes
Best Nodes
Worst Nodes
Saints in our Book
Leftovers?
The St. Larry Wall Shrine
Offering Plate
Awards
Craft
Snippets Section
Code Catacombs
Quests
Editor Requests
Buy PerlMonks Gear
PerlMonks Merchandise
Planet Perl
Perlsphere
Use Perl
Perl.com
Perl 5 Wiki
Perl Jobs
Perl Mongers
Perl Directory
Perl documentation
CPAN
Random Node
Voting Booth?

What level of existential comfort do you require?

Palace
Executive suite at the best hotel
Regular hotel in a decent part of town
Motel
Boarding house
Sleeping Bag on Couch in Basement
Any port in a storm
Camping under the freeway overpass
Jail
Other

Results (279 votes), past polls