Beefy Boxes and Bandwidth Generously Provided by pair Networks BBQ
Don't ask to ask, just ask
 
PerlMonks  

Matching with /g: Is this a bug?

by morgon (Chaplain)
on May 27, 2013 at 14:26 UTC ( #1035415=perlquestion: print w/ replies, xml ) Need Help??
morgon has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

can someone explain to me the following behaviour or confirm that it is a bug:

use strict; my $s = <<end; hubba1 bubba hubba2 end print $s =~ /(hubba\d)/gm;
The above code prints "hubba1hubba2" as expected.

But when I do this

use strict; my $s = <<end; hubba1 bubba hubba2 end $s =~ /bubba/gm; print $s =~ /(hubba\d)/gm;
it only prints "hubba2", because (I assume) the second match only starts where the first match matched (ie pos is not reset).

When I do this:

use strict; my $s = <<end; hubba1 bubba hubba2 end $s =~ /bubba/gm; $s =~ /no match/gm; print $s =~ /(hubba\d)/gm;
I get the correct behaviour again - I assume that the unsuccessul match attempt resets pos here.

I am using 5.14.2 and my understanding is that the behaviour in the second case is a bug or is there something I don't understand?

Many thanks! Update:

Corrected the code-examples

Comment on Matching with /g: Is this a bug?
Select or Download Code
Re: Matching with /g: Is this a bug?
by choroba (Abbot) on May 27, 2013 at 14:47 UTC
    Your code does not compile. You are missing closing /'s at your substitutions. You probably originaly meant
    #!/usr/bin/perl use strict; use warnings; my $s = << '__STRING__'; hubba1 bubba hubba2 __STRING__ $s =~ /bubba/gm; $s =~ /no match/gm; print $s =~ /(hubba\d)/gm;
    This is the documented behaviour. Add the /c flag to the second match to keep the position even after the failed match. See perlre for details.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Matching with /g: Is this a bug?
by Athanasius (Prior) on May 27, 2013 at 15:03 UTC

    As choroba says, this is the documented behaviour. See also “Global Matching” in perlretut#Using-regular-expressions-in-Perl.

    Perhaps it’s the context that’s confusing you? In scalar context, pos is reset only when the match fails or the string is changed. But print supplies list context to its arguments, so all remaining matches are returned and pos is always reset.

    Update: Added the word “remaining” as a correction in response to choroba’s post, below.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      all matches are returned and pos is always reset
      How is it possible, then, that adding /c to the second match only outputs the second occurrence?
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Matching with /g: Is this a bug?
by ww (Bishop) on May 27, 2013 at 15:07 UTC
    If you'll insert the following line in example 2, after Ln 9, you may find the variance in output illuminating:
    print "After match on bubba,\n\t \$s: $s";

    If you didn't program your executable by toggling in binary, it wasn't really programming!

Re: Matching with /g: Is this a bug?
by CountZero (Chancellor) on May 27, 2013 at 15:10 UTC
    It is as expected and documented:

    In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function; see pos. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the /c modifier (for example, m//gc). Modifying the target string also resets the search position.

    The position where to continue matching is linked to the variable that is used, not to the regex!.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
      Ok, thanks for pointing that out.

      btw: What is the syntax for resetting the search position via pos (I've tries pos(undef) but that does not work)?

        pos($string) = 0;
        See pos.
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Matching with /g: Is this a bug?
by kcott (Abbot) on May 27, 2013 at 15:30 UTC

    G'day morgon,

    "it only prints "hubba2", because (I assume) the second match only starts where the first match matched (ie pos is not reset)."

    Yes, your assumption is correct:

    $ perl -Mstrict -Mwarnings -E ' my $s = <<end; hubba1 bubba hubba2 end $s =~ /bubba/gm; say pos($s); print $s =~ /(hubba\d)/gm; ' 12 hubba2

    [length("hubba1\nbubba") == 12]

    and agrees with the pos documentation:

    "Returns the offset of where the last m//g search left off ..."

    In your last statement, you appear to be contradicting your previous, correct assumption:

    "... my understanding is that the behaviour in the second case is a bug or is there something I don't understand?"

    The code you've posted in the third case does not give "the correct behaviour"; it gives a syntax error. You probably meant /.../ instead of s/.../ (2 instances). Assuming you did, that also agrees with the pos documentation:

    "... search position is reset (usually due to match failure ..."

    -- Ken

Re: Matching with /g: Is this a bug?
by ikegami (Pope) on May 29, 2013 at 20:33 UTC
    Noone seems to have mentioned the bug is your inappropriate use of g in scalar/void context. Remove the g from the first match, and it'll work fine.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1035415]
Approved by hdb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2014-04-19 05:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (478 votes), past polls