Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

m//g behaves strange...

by Anonymous Monk
on Nov 09, 2003 at 21:15 UTC ( #305715=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

During my first steps in perlgolf I've encountered a strange m//g behaviour (perl 5.8.1).

When I run following code:

#!/usr/bin/perl -l $_="123"; @a=/./g; print "#1: ", /./g; $_="123"; $a=/./g; print "#2: ", /./g; $_="123"; /./g; print "#3: ", /./g; $_="123"; /./g; print "#4: ", /\G./g; $_="123"; /./g; undef pos; print "#5: ", /./g; $_="123"; /./g; undef pos; print "#6: ", /\G./g; $_="123"; /./g; pos = 2; print "#7: ", /./g; $_="123"; /./g; pos = 2; print "#8: ", /\G./g;

I get this output:

#1: 123 #2: 23 #3: 23 #4: 23 #5: 123 #6: 123 #7: 3 #8: 3

This test shows that:

  1. In scalar context m//g doesn't return size of the list it would give back when called in list context,
  2. The \G anchor is useless - perlre states, that it should be used only at the beginning of pattern,
  3. and most important - m//g in scalar context doesn't reset pos (matches only once).

I ask you if it's proper behaviour / undocumented feature / bug? Or maybe I have missed something?

PS. My username is kokr but somehow email with my passwd can't find it's way to my mailbox :>

Comment on m//g behaves strange...
Select or Download Code
Re: m//g behaves strange...
by Anonymous Monk on Nov 09, 2003 at 21:28 UTC

    Your test output is entirely expected behavior according to the documentation. Perhaps it would be better if you indicated what you expected the output to be.

    1. m//g returns true or false in scalar context.
    2. \G is not useless.
    3. m//g won't reset pos() until the match fails.
Re: m//g behaves strange...
by antirice (Priest) on Nov 09, 2003 at 21:41 UTC

    I think you've missed something.

    1. This is somewhat more of a feature. If you want to extract all matches from a regex, you can do @a=/regex/g and get all the strings that match. If you want the count, you could do $a=()=/regex/g;.
    2. From perlre:
      Perl defines the following zero-width assertions:
      \G - Match only at pos() (e.g. at the end-of-match position of prior m//g)
      In other words, if the regex starts with \G the match has to start at pos or the regex doesn't match.
    3. m//g shouldn't reset in scalar context until it fails. Otherwise you couldn't do loops such as while ($string =~ /regex/g) { ...

    Perl Idioms Explained - @ary = $str =~ m/(stuff)/g by tachyon should help with regexes in list context.

    Hope this helps.

    antirice    
    The first rule of Perl club is - use Perl
    The
    ith rule of Perl club is - follow rule i - 1 for i > 1

Re: m//g behaves strange...
by converter (Priest) on Nov 09, 2003 at 21:54 UTC

    In scalar context m//g doesn't return size of the list it would give back when called in list context,

    If you want the number of elements in the list returned by a pattern with the /g modifier when evaluated in list context, you can use a list assignment in scalar context, which produces the number of its elements. In this case, we assign to an empty list:

    $_ = "456"; $count = () = /./g; print $count; # prints 3

Re: m//g behaves strange...
by pg (Canon) on Nov 10, 2003 at 02:59 UTC

    I had recently answered a post, Re: An Insane Typo Bug, and it relates to your question in an interesting way. The original post in that thread has a totally different face with your wonder, but both are about the same fact that, in scalar context, m// returns either 1 or 0.

    For pos(), try this, it gives you 1 and 9, so pos() does work:

    use strict; use warnings; $_ = "0123456789"; my $ret = m/0/g; print "ret = $ret, pos = " . pos() . "\n"; $ret = m/8/g; print "ret = $ret, pos = " . pos() . "\n";
      <pedantic> actually, m// in scalar context returns either 1 or "" (empty string). </pedantic>.

        You are very close to 100% right. However I do observe something else, and I don't let things escape easily.

        If I do this:

        use strict; use warnings; { $_ = "1234"; my $ret = /2/g; print "($ret)\n" } { $_ = "1234"; my $ret = /9/g; print "($ret)\n" }

        The outputs are 1 and "empty string", which indicate that you are right.

        However, try this:

        use strict; use warnings; { my $a = 0; print "(" . ~$a . ")\n"; } { my $a = 1; print "(" . ~$a . ")\n"; } { my $a = ""; print "(" . ~$a . ")\n"; }

        It returns:

        (4294967295) (4294967294) ()

        Remeber the return values for zero and empty string, and then try this:

        use strict; use warnings; { $_ = "1234"; my $ret; print ~ m/2/, "\n"; } { $_ = "1234"; my $ret; print ~ m/9/, "\n"; }

        It gives you:

        4294967294 4294967295

        Which indicates the "~" operator does receive 0, not "empty string". Rememebr that in the case that we explicitly pass "~" an empty string, it is not converted to 0

        However, if we do this:

        use strict; use warnings; { $_ = "1234"; my $ret; print ~ ($ret = m/2/), "\n"; print "($ret)\n"; } { $_ = "1234"; my $ret; print ~ ($ret = m/9/), "\n"; print "($ret)\n"; }

        You get:

        4294967294 (1) 4294967295 ()

        It seems that although $ret receives "empty string", "~" operator receives 0, again rememebr that we didn't see this kind of auto-convertion in the explicitly-passing-empty-string case.

Re: m//g behaves strange...
by Dominus (Parson) on Nov 10, 2003 at 20:01 UTC
    Says kokr:
    In scalar context m//g doesn't return size of the list it would give back when called in list context,
    It's not supposed to do that. m//g in scalar context has a very interesting result:
    my $s = "123 45 6 789"; while ($s =~ m/\d+/g) { print "> $&\n"; }
    This prints:
    > 123 > 45 > 6 > 789
    2.The \G anchor is useless - perlre states, that it should be used only at the beginning of pattern,
    It's not useless. If you change the pattern in the example above to /\G\d+/g you get a different result. But here's a typical example of how one might use \G:
    my $s = "123 carrots 45 6 bananas 789"; while (1) { $s =~ /\G(\d+)/gc and print "NUMBER $1\n" and next; $s =~ /\G\s+/gc and print "SPACE\n" and next; $s =~ /\G([a-z]+)/gc and print "WORD $1\n" and next; $s =~ /\G$/gc and last; }
    This prints:
    NUMBER 123 SPACE WORD carrots SPACE NUMBER 45 SPACE NUMBER 6 SPACE WORD bananas SPACE NUMBER 789
    What happens if you remove the 'useless' \G's? You get a very different result:
    NUMBER 123 NUMBER 45 NUMBER 6 NUMBER 789
    3.and most important - m//g in scalar context doesn't reset pos (matches only once).
    Here you have some confusion, but I can't tell what it is amongst the other confusions in your articles. Did you realize that the /./g in print "#7: ",   /./g; was in list context, not scalar context? Did you realize that assigning $_ = "123" will reset pos($_)? Did you realize that m//g isn't supposed to 'reset' pos unless the match fails? In fact, the whole point of /g is that it does not reset pos. Ordinary matches, without /g, reset pos before matching begins.

    Consider this:

    my $s = "123 carrots 45 6 bananas 789"; while ($s =~ /(\d+)/g) { print "'$1' at position ", pos($s)-length($1), "\n"; }
    The output is:
    '123' at position 0 '45' at position 15 '6' at position 18 '789' at position 29
    So clearly pos is doing something. Now let's reset pos:
    my $s = "123 carrots 45 6 bananas 789"; while ($s =~ /(\d+)/g) { print "'$1' at position ", pos($s)-length($1), "\n"; pos($s) += 13; }
    Now the output is different:
    '123' at position 0 '5' at position 16 '89' at position 30
    The first match is as before. But the pos($s) += 13 forces the current match position forward, into the middle of the 45, so that the next match sees only the 5 part. After matching the 5, the next pos($s) += 13 jumps past the 6 entirely, into the middle of the 789.

    I ask you if it's proper behaviour / undocumented feature / bug? Or maybe I have missed something?
    A combination of all of these, I think. #1 is proper behavior. #2 seems to be a case of your having missed something. #3 is an undocumented feature, but it's undocumented because it doesn't exist. But also the behavior of \G and /g is very badly documented in general.

    I hope this helps, but I'm not sure what your objection is, so I can't address it directly.

    --
    Mark Dominus
    Perl Paraphernalia

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://305715]
Approved by ybiC
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (11)
As of 2015-07-02 02:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (25 votes), past polls