Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
Just another Perl shrine
 
PerlMonks  

[Perl 5.14, regex]: Problems with /g, \G and pos()

by Darkwing (Novice)
on Oct 19, 2013 at 12:40 UTC ( #1058911=perlquestion: print w/ replies, xml ) Need Help??
Darkwing has asked for the wisdom of the Perl Monks concerning the following question:

Hi community, this is my very first question here. I have some problems with /g, \G and pos() and i do not understand what's going on (possibly a bug in my version of perl?). Hope someone knows about the following behaviour of perl:
# File: defect1.pl use strict; use warnings; my $str = " xyz"; $str =~ /^(\s*)/g or die; print("match1: pos=", pos($str), "\n"); $str =~ /\G(\s*)/g or die; print("match2: pos=", pos($str), "\n"); $str =~ /\G(\s*)/g or die; print("match3: pos=", pos($str), "\n");
Output:
match1: pos=2 match2: pos=2 Died at defect1.pl line 13.

Why does the third match fail? I'd expect that after the first regex i could repeat $str=~/\G(\s*)/g again and again, it should always succeed (and never change pos()) since \s* also matches zero occurrences of spaces.

Even worse is this:
use strict; use warnings; my $str = " xyz"; $str =~ /^(\s*)/g; my ($x) = ($str =~ /\G(x)/g) or die; defined(pos($str)) or warn("pos() is undefined!\n"); print("this should be x:$x\n");
Output:
pos() is undefined! this should be x:x
Even though the regex succeeded and $x has the correct value, pos($str) is undefined. Same result from this variant:
use strict; use warnings; my $str = " xyz"; $str =~ /^(\s*)/g; my $x = ($str =~ /\G(x)/g)[0] or die; defined(pos($str)) or warn("pos() is undefined!\n"); print("this should be x:$x\n");
Again:
pos() is undefined! this should be x:x
In contrast, the following works fine:
use strict; use warnings; my $str = " xyz"; $str =~ /^(\s*)/g; $str =~ /\G(x)/g or die; my $x = $1; defined(pos($str)) or warn("pos() is undefined!\n"); print("this should be x:$x\n");
Output:
this should be x:x
Obviously, perl clobbers the position associated with $str as soon as i treat the regex's result as an array - but why? I'm running perl 5.14 on Suse Linux 12.1. perl -v says:
This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-li +nux-thread-multi

Comment on [Perl 5.14, regex]: Problems with /g, \G and pos()
Select or Download Code
Re: [Perl 5.14, regex]: Problems with /g, \G and pos()
by roboticus (Canon) on Oct 19, 2013 at 13:05 UTC

    Darkwing:

    Interesting! I'm going to have to play with some of the dusty corners of regexes. Anyway, while trying to figure it out, I stumbled across this bit in perldoc perlre under the heading "Assertions":

    The "\G" assertion can be used to chain global matches (using "m//g"), as described in "Regexp Quote-Like Operators" in perlop. It is also useful when writing "lex"-like scanners, when you have several patterns that you want to match against consequent substrings of your string, see the previous reference. The actual location where "\G" will match can also be influenced by using "pos()" as an lvalue: see "pos" in perlfunc. Note that the rule for zero-length matches is modified somewhat, in that contents to the left of "\G" is not counted when determining the length of the match. Thus the following will not match forever:
    $str = 'ABC'; pos($str) = 1; while (/.\G/g) { print $&; }
    It will print 'A' and then terminate, as it considers the match to be zero-width, and thus will not match at the same position twice in a row.

    I don't really grok exactly what and why /G and m//g work that way, so I'll have to monkey around with it to get a feel for things. But I'm thinking that that bit of documentation holds the key to your problem.

    As for the second part of your question, think of m//g in list context as an iterator where pos() is the (user accessible) interface to the iterator. Before you do a match with m//g, pos is undefined, as there is no active iterator for the string. As you consume each match, the iterator is updated so it can start at the next location. Once you've consumed all the matches, the iterator ends and pos() goes back to being undefined. Since you're executing it in a list context, it's doing *all* the iterations at once, and when you get to your next line of code, the iterator is already exhausted.

    ...roboticus

    When your only tool is a hammer regex, all problems look like your thumb HTML.

    Post Script: That's a rather fine first posting to the site.

Re: [Perl 5.14, regex]: Problems with /g, \G and pos()
by wjw (Hermit) on Oct 19, 2013 at 13:41 UTC
    Was curious about this because I am laughably ignorant when it comes to regex's. So I looked Here under the "\G Magic with Perl" section.

    Subsequently, I Googled "perl position of last match" which led me to this which led me to try the following:

    my $str = " xyz" print "$str\n"; $str =~ /^(\s*)/g or die; print("match1: pos=", pos($str), "\n"); print "$str\n"; print "@-\n"; print "@+\n"; $str =~ /\G(\s*)/g or die; print("match2: pos=", pos($str), "\n"); print "$str\n"; print "@-\n"; print "@+\n"; $str =~ /\G(\s*)/g or warn; print("match3: pos=", pos($str), "\n"); print "$str\n"; print "@-\n"; print "@+\n";
    Which outputs
    xyz match1: pos=2 xyz 0 0 2 2 match2: pos=2 xyz 2 2 2 2 Warning: something's wrong at /tmp/Perl-1.pl line 20. Use of uninitialized value in print at /tmp/Perl-1.pl line 21. match3: pos= xyz 2 2 2 2
    which I think might help describe what happens...(of course I could be wrong)

    When \G goes to the position of the previous match to find the next white space, it is already beyond the last position of white space in the string. The warning is sort of uninformative though... . I gotta wonder if the last position and first position being equal leave the \G wondering where to go next?

    Am looking forward to other responses to this by those who know...

    (yet another opportunity to display my ignorance! Hot Damn!)


    UPDATE: looks like the tool mentioned Regex analysis might be helpful with this Hope that is helpful....
    ...the majority is always wrong, and always the last to know about it...
    Insanity: Doing the same thing over and over again and expecting different results.
Re: [Perl 5.14, regex]: Problems with /g, \G and pos() ("bug")
by tye (Cardinal) on Oct 19, 2013 at 14:54 UTC
Re: [Perl 5.14, regex]: Problems with /g, \G and pos()
by moritz (Cardinal) on Oct 19, 2013 at 16:15 UTC

    The problem is that your regex potentially matches zero characters, so a loop like

    while (/\s*/g) { ... }

    Would loop infinitely. To prevent that, perl has some extra magic asociated with zero-width matches and /g, which you observe here. The obvious solution is to not use a regex which can match zero characters. You can prevent resetting of pos by using the /gc modifiers.

      Ah, well, that's reasonable. Thanks to you and to all others for help!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1058911]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2014-04-20 13:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls