Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Regex Pop Quiz with .*, /g, and /s

by saintmike (Vicar)
on Oct 02, 2007 at 17:33 UTC ( [id://642180]=perlquestion: print w/replies, xml ) Need Help??

saintmike has asked for the wisdom of the Perl Monks concerning the following question:

What's the result of the following code snippet?
my $string = "12\n34"; $string =~ s/.*/go/gs;
Pick one:
  • $string contains "go", since the /s modifier lets .* match the entire string
  • Matching .* repeatedly results in an endless loop, since it matches the empty string
  • None of the above.

Replies are listed 'Best First'.
Re: Regex Pop Quiz with .*, /g, and /s (bug)
by tye (Sage) on Oct 02, 2007 at 18:05 UTC

    Just a couple of days ago Larry himself admitted (in the CB) that this is a bug in Perl.

    - tye        

Re: Regex Pop Quiz with .*, /g, and /s
by Thelonius (Priest) on Oct 02, 2007 at 18:13 UTC
    I'd have to say that it was not what I expected. I'm also surprised that this hasn't bitten me before. While I know theoretically that * can match empty strings, perl has always seemed to me to do the intuitive thing.

    I guess I generally don't replace nothing with something. Usually when I use * I'm either just skipping over white space (or the moral equivalent), or I replace any pattern than has * in it with itself. That is, something like  s/A(.*)B/C${1}D/g;

    Generally, I also try to constrain my patterns more, so I usually avoid constructs like ".*" or "*?", and would write something like s/A([^B]*)B/C${1}D/g;

    The /s doesn't seem to have anything to do with it, except that, of course, you included a \n in your string. For example,

    my $string = "aaab"; $string =~ s/a*/go/g;
    Now string is "gogobgo".
Re: Regex Pop Quiz with .*, /g, and /s
by kyle (Abbot) on Oct 02, 2007 at 17:54 UTC

    I'm not really into the quiz thing (except maybe in the polling section), so I just ran it.

    It makes my brain hurt. Basically, when you replace something that can have zero width, you're headed for a land of confusion (stop by and say hi; I hang out there a lot).

      Since you haven't used the /m modifier, Perl won't treat it as a "single line".

      No. That's backwards. Single-line is the default. m switches to (m)ultiline mode.

      Furthermore, m only affects the defintion of ^ and $. Since neither are used here, whether m is present or absent is irrelevant.

      The replacement first takes off the first three characters (which includes the newline), and then it goes for another pass to get the last two characters.

      No. In fact, the \n is a red herring. The same problem occurs with my $string = "1234";.

      The first pass sees the characters at pos 0 to 3 replaced with "go", setting pos = 4.
      The second pass sees the characters at pos 4 to 4 replaced with "go", setting pos = 4.
      The third pass ends the g loop since the only possible match would start and end at the same positions as the second pass.

      my $string = "1234"; $string =~ s/.*/print("($&)");'go'/egs; # (1234)()

      What I find most interesting, however is that the /g seems to have something to do with it too. That is /ms does one replacement

      With and without g, both regex do the same first substitution. With g, it proceeds to do other possible substitutions. That's the very definition of g.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://642180]
Approved by Thelonius
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2024-06-14 10:11 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.