Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Regex Capturing: Is this a bug or a feature?

by shotgunefx (Parson)
on Sep 28, 2002 at 14:35 UTC ( #201446=perlquestion: print w/ replies, xml ) Need Help??
shotgunefx has asked for the wisdom of the Perl Monks concerning the following question:

After many hours of trying to trace down an anomoly in some test code, I ran into what appears to be a bug in regex caputring.

In perlre it states "The numbered variables ($1, $2, $3, etc.) and the related punctuation set ($+, $&, $`, $', and $^N) are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first. "

I've always taken this to mean they are "local" scoped to a block just as if you has said local($1), apparently this is not the case. Or at least not the case on my build of Perl v5.6.1

In the example below, after the first match, $1 and $2 are still populated every iteration after. Shouldn't they be reset?
#!/usr/bin/perl # 5.6.1 use strict; use warnings; use vars qw ($test); my @values = qw ( one var.1 test); print join ("\t->\t",getsymbolval(@values) ),"\n"; ################################################# sub getsymbolval{ no strict qw (refs); my @syms = @_; foreach my $symbol (@syms){ local $test; ############################## # Bug? # ############################## $symbol=~m/(\w+)\.(\d+)/; # Shouldn't $1,$2... get reset ea +ch time through? print "symbol: $symbol\t\$1: $1\t\$2:$2\n"; print "test is ",$test++,"\n"; # Never incremented more tha +n once my ($ts,$te) = ($1,$2); } wantarray ? @syms : $syms[0]; }


-Lee

"To be civilized is to deny one's nature."

Comment on Regex Capturing: Is this a bug or a feature?
Download Code
Re: Regex Capturing: Is this a bug or a feature?
by fruiture (Curate) on Sep 28, 2002 at 14:54 UTC
    I've always taken this to mean they are "local" scoped to a block just as if you has said local($1), apparently this is not the case. Or at least not the case on my build of Perl v5.6.1

    Yes, it's not "local", for local also localizes a variable for one loop iteration. May be this behaviuor is not perfectly documented, but in fact I've never had any problems with that so far. You should not work with these variables anyway, but use your own for better control.

    our $x = 0; foreach(@values){ local $x; /m(\w+)\.(\d+)/ and my ($f,$s) = ($1,$2); print "-- $_: $1;$s; ",$x++,"\n" }
    --
    http://fruiture.de
      The code above is stripped down to an example. I do find this behaviour quite suprising and very NWIM. I would go so far to say that it's documentated incorrectly (at the very least, poorly). Both are described as dynamic scoping.

      Looking at perlsub and local() This is known as dynamic scoping. Lexical scoping is ...

      -Lee

      "To be civilized is to deny one's nature."

        Well, the documentation doesn't say they're scoped like "local" would do :-)

        In a way you're right, but imho there is no problem arising from this issue if you always use your own variables instead of $1 .. $n. (Which means you assign your own vars immediately after the match).

        --
        http://fruiture.de
•Re: Regex Capturing: Is this a bug or a feature?
by merlyn (Sage) on Sep 28, 2002 at 15:34 UTC
    It isn't really like "local" scoping, because they inherit inward into inner blocks. Although I couldn't have predicted that they'd retain the same value each time, you have triggered one of my pet peeves: namely using $1 without testing whether or not the match succeeds. Please write your code so that you avoid $1 unless you are sure of a match, and problems like this will go away.

    As for the "localization", it's more like "scope-limited copy-on-write". If you're in an inner scope and you change $1, the effect doesn't propogate to an outer scope. However, if you merely access it, you get the inherited outer value.

    -- Randal L. Schwartz, Perl hacker

      Avoiding the problem is fine now that I know about it.
      Normally I do say if ( m/(whatever)/ ){ assign} which is why I never noticed this before. But I've never heard anyone or read anything that equates dynamic scoping with anything other than local() in perl and I've seen the terms local and dynamic used interchangeably.

      In my example (or at least the code it was reduced from) I did a match, saved the value and checked later for a defined value. If this was a "regular" dynamic scoped variable than I would be testing if it passed or failed, just later on.

      I own and have read all the ORA Perl books (with the exception of Mastering Regular Expressions) and I don't recall it every being mentioned. I think it should be clarified in perlre is all.

      -Lee

      "To be civilized is to deny one's nature."
        It has bitten me too, and I bet some other people too. I know about it know, and it is ok, but what really lacks is documentation about it.

        I seem to recall that it triggers some awkvardness in solutions too, at times, as just because a regexp matches, it may not actually fill all of the $n variables, and old values may still be there, and that makes checking harder. I would have to get back on that with a real example though, in case it is my memory that fools me. :)


        You have moved into a dark place.
        It is pitch black. You are likely to be eaten by a grue.
Re: Regex Capturing: Is this a bug or a feature?
by Anonymous Monk on Sep 28, 2002 at 16:21 UTC
    I've always interpreted it like this. Since the documentation says it's localized to the current scope -- and not that it's localized upon a match -- then it's the same variable in the whole block. My Perl interpretation of this is:
    { # localize the variables # the block }
    E.g.
    { local ($1, $2) = ($1, $2); # OK, so you can't actually write this. # The hacky can write local (*1, *2) = \($1, $2); instead. # Your loop: foreach my $symbol (@syms){ local $test; $symbol=~m/(\w+)\.(\d+)/; print "symbol: $symbol\t\$1: $1\t\$2:$2\n"; print "test is ",$test++,"\n"; my ($ts,$te) = ($1,$2); } }
    Cheers,
    -Anomo
      While your interpretation happens to match the way it works, why would you interpret it being scoped to the outer block? It's declared in the foreach's block.

      If you go by perlre saying "are all dynamically scoped until the end of the enclosing block ", I interpret that as the foreach's block as that is the enclosing block in this example.

      I just wish it was better documented so I could have slept 10 hours instead of banging my head of the wall.

      -Lee

      "To be civilized is to deny one's nature."
        Why would you interperate it being scoped to the outer block?

        I'd like to change "the outher block" to "an outer (imaginary) block". The outer block I used above isn't written by the coder.

        It's declared in the foreach's block

        What's declared in that block? Something is declared for the block rather than in it.

        I interperate that as the foreach's block as that is the enclosing block in this example.

        Effectively that's the same thing as the end of the imaginary block.

        I thought it would be clear that everything but your code was "added" by Perl, especially since I wrote my own $DIGIT localizer.

        Hope this makes it more clear what I meant.

        Cheers,
        -Anomo
Re: Regex Capturing: Is this a bug or a feature?
by Elian (Parson) on Sep 30, 2002 at 04:45 UTC
    $1 and friends are neither global nor lexical variables. They're Weird Magical Things, and they're tied to the optree, not to any scratchpad or symbol table.

    The optree is the parsed and processed form of your program that the perl interpreter actually executes. When you have a regex, perl reserves space for the potential match variables and hangs a pointer to them off the optree (sort of a mini-scratchpad. Sort of) which the rest of the code in the lexical scope (and any subsequent inner scopes, at least some times) will access. (The compiler handles the visibility to inner lexical scopes thing--there's still no scratchpads involved)

    Because it's just some odd, regex-engine-private memory, it doesn't really behave the same way that other variables do. They don't get explicitly reset--they're just set when a match happens. So if you skip a match, they retain their old values.

    This semi-sorta-global behavior also makes for fun with threads--you must mutex-protect any regex with capturing parens in a threaded program when using 5.005-style threads, or when using ithreads with a 5.6.x perl. (5.8.0 unshares them so its safe) This includes ActivePerl's fork-emulation, though since that doesn't actually expose mutexes to do so it's kind of tricky. (Recent ActivePerl releases might have fixed this--check the release notes)

      Thanks for revealing under the hood. I don't have a problem with them working different, I just didn't expect it though perhaps I should have :)

      Still think perlre could be clearer though.

      -Lee

      "To be civilized is to deny one's nature."
        perlre could definitely be clearer. Part of the problem is very few people understand it properly, so the documentation's often done by people who don't quite get it. (As the people who do understand the regex engine well enough to notice the problems are generally too heavily medicated to do anything... :)
Re: Regex Capturing: Is this a bug or a feature?
by hossman (Prior) on Sep 30, 2002 at 22:59 UTC

    This node has pretty much been hashed to death, but I just noticed it and wanted to share my own little bit of nostalgia from back when i first realized how mgical $1 really was. I was working with teh File::PathConvert module, and discovered that in some strange instances, the filenames one method returne contained weird strings from other parts of my program that had no earthly buisiness being there, which lead to a lengthy investigation, and the consumate workarround for dealing with any module/method that doesn't do a good job of testing $1,$2,... before using them...

    Date: Thu, 8 Apr 1999 12:32:44 -0700 (PDT)
    From: {me}
    To: {the module authors}
    Subject: BUG in File::PathConvert::realpath
    
    there appears to be a bug in realpath that causes it to have problems with
    paths that have spaces in them.  the bug causes the value of $1 prior to
    the call of realpath to get appended to the result, the bug does not
    manifest itself if $1 is undefined prior to calling realpath.
    
    a script which demonstrates this bug (as well as a fix i found to ensure
    it dosesn't cause problems) is attached... run this program on a file name
    with spaces in it to see the bug demonstrated.
    
    here is the output of the program on my machine...
    
    guido:~/code> touch "a file with spaces"
    guido:~/code> test.pl --file "a file with spaces" 
    File::PathConvert::realpath test
    File::PathConvert version#: 0.85
    
              $FILE == a file with spaces
    REALPATH($FILE) == /export/home/hossman/code/a file with spaces
    EXECUTING CODE ==> "some string" =~ /(\S*)/;
    REALPATH($FILE) == /export/home/hossman/code/a file with spacessome
    EXECUTING CODE ==> "" =~ /((((((((()))))))))/;
    REALPATH($FILE) == /export/home/hossman/code/a file with spaces
    guido:~/code> 
    guido:~/code> test.pl --file a\ file\ with\ spaces 
    File::PathConvert::realpath test
    File::PathConvert version#: 0.85
    
              $FILE == a file with spaces
    REALPATH($FILE) == /export/home/hossman/code/a file with spaces
    EXECUTING CODE ==> "some string" =~ /(\S*)/;
    REALPATH($FILE) == /export/home/hossman/code/a file with spacessome
    EXECUTING CODE ==> "" =~ /((((((((()))))))))/;
    REALPATH($FILE) == /export/home/hossman/code/a file with spaces
    guido:~/code> 
    

    this was the attachment...

    #!/usr/bin/perl -w use strict; use Getopt::Long; use File::PathConvert; my $file; GetOptions('file=s' => \$file); die "need --file arg" unless defined $file; print "File::PathConvert::realpath test\n"; print "File::PathConvert version#: " . $File::PathConvert::VERSION . " +\n\n"; print " \$FILE == $file\n"; print "REALPATH(\$FILE) == " . File::PathConvert::realpath($file) . "\ +n"; "some string" =~ /(\S*)/; print "EXECUTING CODE ==> \"some string\" =~ /(\\S*)/;\n"; print "REALPATH(\$FILE) == " . File::PathConvert::realpath($file) . "\ +n"; # executing this regexp prior to any call of realpath ensures that it # won't generate fall pray to $n bug "" =~ /((((((((()))))))))/; print "EXECUTING CODE ==> \"\" =~ /((((((((()))))))))/;\n"; print "REALPATH(\$FILE) == " . File::PathConvert::realpath($file) . "\ +n";
      I noticed a similar oddity. Oh well, learn something new everyday.

      -Lee

      "To be civilized is to deny one's nature."

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://201446]
Approved by broquaint
Front-paged by wil
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (9)
As of 2014-08-30 20:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (293 votes), past polls