Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: Strange regex to test for newlines: /.*\z/

by moritz (Cardinal)
on May 21, 2007 at 13:56 UTC ( [id://616562]=note: print w/replies, xml ) Need Help??


in reply to Re: Strange regex to test for newlines: /.*\z/
in thread Strange regex to test for newlines: /.*\z/

Why should I need an /s for multiline strings?
$ perl -e 'print "match\n" if "foo\nbar" =~ m/bar/;' match
So when an ordinary string after an \n is matched, why should an empty string, here presented by .*, fail to match?

After all the regex is not anchored to the start of the string

Replies are listed 'Best First'.
Re^3: Strange regex to test for newlines: /.*\z/
by shmem (Chancellor) on May 21, 2007 at 14:26 UTC
    Because in a //m, the end of string matching "f\n" is set before the '\n' if the '\n' is trailing. The '\n' is skipped in the match, but the position after "f" isn't the end of the string:
    perl -D512 -e '$_ = "f\n";/.*\z/' Compiling REx `.*\z' size 4 Got 36 bytes for offset annotations. first at 2 rarest char at 0 1: STAR(3) 2: REG_ANY(0) 3: EOS(4) 4: END(0) floating ""$ at 0..2147483647 (checking floating) anchored(MBOL) impli +cit minlen 0 Offsets: [4] 2[1] 1[1] 3[2] 5[0] Omitting $` $& $' support. EXECUTING... Guessing start of match, REx ".*\z" against "f "... Found floating substr ""$ at offset 1... Position at offset 0 does not contradict /^/m... Guessed: match at offset 0 Matching REx ".*\z" against "f " Setting an EVAL scope, savestack=3 0 <> <f > | 1: STAR REG_ANY can match 1 times out of 2147483647 +... Setting an EVAL scope, savestack=3 1 <f> < > | 3: EOS failed... failed... Guessing start of match, REx ".*\z" against " "... Found floating substr ""$ at offset 0... Position at offset 0 does not contradict /^/m... Guessed: match at offset 0 Setting an EVAL scope, savestack=3 1 <f> < > | 1: STAR REG_ANY can match 0 times out of 2147483647 +... Setting an EVAL scope, savestack=3 1 <f> < > | 3: EOS failed... failed... Match failed Freeing REx: `".*\\z"'

    The matching isn't extended after the "\n". Whereas here

    perl -D512 -e '$_ = "f\n";/.*\z/s' Compiling REx `.*\z' size 4 Got 36 bytes for offset annotations. first at 2 rarest char at 0 1: STAR(3) 2: SANY(0) 3: EOS(4) 4: END(0) floating ""$ at 0..2147483647 (checking floating) anchored(SBOL) impli +cit minlen 0 Offsets: [4] 2[1] 1[1] 3[2] 5[0] Omitting $` $& $' support. EXECUTING... Guessing start of match, REx ".*\z" against "f "... Found floating substr ""$ at offset 1... Guessed: match at offset 0 Matching REx ".*\z" against "f " Setting an EVAL scope, savestack=6 0 <> <f > | 1: STAR SANY can match 2 times out of 2147483647... Setting an EVAL scope, savestack=6 2 <f > <> | 3: EOS 2 <f > <> | 4: END Match successful! Freeing REx: `".*\\z"'

    you can see that the '\z' (<> in the debug output) is found after the "\n":

    Setting an EVAL scope, savestack=6 2 <f > <> | 3: EOS 2 <f > <> | 4: END

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      Sorry, I still don't get it.

      Obviously /\z/ matches the string "f\n", so why should it fail to match if I prepend it with something that matches the empty string? This should be independent of where the end of the string is considered to be.

      And why does /.?\z/ match and /.*\z/ not?

      If we expand that scheme, why does /.?.?\z/ match, and /.*.?\z/ not?

      In all cases I'd expect .? and .* to be reduced to the empty string - why doesn't it happen?

        Now I'm confused as well, my mental model doesn't seem to fit (at least not everywhere :-)

        Maybe demerphq could tell?

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      i personally would be interested in why the following happens:
      "\n" =~ /\n.*\z/; # matches "\n" =~ /.*\z/; # doesn't match. i would expect it to match "\n" =~ /[^\n]*\z/; # matches. like expected. but [\n]* is like .*
      /s or not /s doesn't have to do something with this, or at least it shouldn't, i think.
        "\n" =~ /\n.*\z/; # matches

        Obvious, I think. You match a "\n", then EOS (end of string).

        "\n" =~ /.*\z/; # doesn't match. i would expect it to match

        perl -D512 tells anchored(MBOL) (i.e. multiline beginning of line, see perldebguts) with that one, which anchoring doesn't happen with

        "\n" =~ /[^\n]*\z/; # matches. like expected. but [\n]* is like .*

        but why?

        "\n" =~ /.?\z/;

        matches, as does

        "\n" =~ /.{0,}\z/;

        I can't get a mental model of why the previous one should, but the next one should not match:

        "f\n" =~ /.?f\z/;

        Weird. Rather inconsistent, if not buggy.

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://616562]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (8)
As of 2024-04-23 13:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found