Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Strange regex to test for newlines: /.*\z/

by shmem (Chancellor)
on May 21, 2007 at 13:37 UTC ( [id://616558]=note: print w/replies, xml ) Need Help??


in reply to Strange regex to test for newlines: /.*\z/

If you have a newline in the string, it's multiline, so you need the 's' modifier:
perl -le '$_ = "foo\n";print "string with trailing newline" if !/.*\z/ + and /.*\z/s' string with trailing newline perl -le '$_ = "foo\nbar";print "string with trailing newline" if !/.* +\z/ and /.*\z/s'

Otherwise the matching stops at the newline, but that isn't the end of the string. It is a single line if you match the end with '$', but after the \n, you are on the next line, and the end of the string happens to be there. How can I put it? It seems logical to me, but I've got to struggle yet with wording.. I'll update this post until I've got it, sorry for that.

update - seems like Ojosh!ro found the right words. Ojosh!ro++, thanks :-)

--shmem

_($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                              /\_¯/(q    /
----------------------------  \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Replies are listed 'Best First'.
Re^2: Strange regex to test for newlines: /.*\z/
by moritz (Cardinal) on May 21, 2007 at 13:56 UTC
    Why should I need an /s for multiline strings?
    $ perl -e 'print "match\n" if "foo\nbar" =~ m/bar/;' match
    So when an ordinary string after an \n is matched, why should an empty string, here presented by .*, fail to match?

    After all the regex is not anchored to the start of the string

      Because in a //m, the end of string matching "f\n" is set before the '\n' if the '\n' is trailing. The '\n' is skipped in the match, but the position after "f" isn't the end of the string:
      perl -D512 -e '$_ = "f\n";/.*\z/' Compiling REx `.*\z' size 4 Got 36 bytes for offset annotations. first at 2 rarest char at 0 1: STAR(3) 2: REG_ANY(0) 3: EOS(4) 4: END(0) floating ""$ at 0..2147483647 (checking floating) anchored(MBOL) impli +cit minlen 0 Offsets: [4] 2[1] 1[1] 3[2] 5[0] Omitting $` $& $' support. EXECUTING... Guessing start of match, REx ".*\z" against "f "... Found floating substr ""$ at offset 1... Position at offset 0 does not contradict /^/m... Guessed: match at offset 0 Matching REx ".*\z" against "f " Setting an EVAL scope, savestack=3 0 <> <f > | 1: STAR REG_ANY can match 1 times out of 2147483647 +... Setting an EVAL scope, savestack=3 1 <f> < > | 3: EOS failed... failed... Guessing start of match, REx ".*\z" against " "... Found floating substr ""$ at offset 0... Position at offset 0 does not contradict /^/m... Guessed: match at offset 0 Setting an EVAL scope, savestack=3 1 <f> < > | 1: STAR REG_ANY can match 0 times out of 2147483647 +... Setting an EVAL scope, savestack=3 1 <f> < > | 3: EOS failed... failed... Match failed Freeing REx: `".*\\z"'

      The matching isn't extended after the "\n". Whereas here

      perl -D512 -e '$_ = "f\n";/.*\z/s' Compiling REx `.*\z' size 4 Got 36 bytes for offset annotations. first at 2 rarest char at 0 1: STAR(3) 2: SANY(0) 3: EOS(4) 4: END(0) floating ""$ at 0..2147483647 (checking floating) anchored(SBOL) impli +cit minlen 0 Offsets: [4] 2[1] 1[1] 3[2] 5[0] Omitting $` $& $' support. EXECUTING... Guessing start of match, REx ".*\z" against "f "... Found floating substr ""$ at offset 1... Guessed: match at offset 0 Matching REx ".*\z" against "f " Setting an EVAL scope, savestack=6 0 <> <f > | 1: STAR SANY can match 2 times out of 2147483647... Setting an EVAL scope, savestack=6 2 <f > <> | 3: EOS 2 <f > <> | 4: END Match successful! Freeing REx: `".*\\z"'

      you can see that the '\z' (<> in the debug output) is found after the "\n":

      Setting an EVAL scope, savestack=6 2 <f > <> | 3: EOS 2 <f > <> | 4: END

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
        Sorry, I still don't get it.

        Obviously /\z/ matches the string "f\n", so why should it fail to match if I prepend it with something that matches the empty string? This should be independent of where the end of the string is considered to be.

        And why does /.?\z/ match and /.*\z/ not?

        If we expand that scheme, why does /.?.?\z/ match, and /.*.?\z/ not?

        In all cases I'd expect .? and .* to be reduced to the empty string - why doesn't it happen?

        i personally would be interested in why the following happens:
        "\n" =~ /\n.*\z/; # matches "\n" =~ /.*\z/; # doesn't match. i would expect it to match "\n" =~ /[^\n]*\z/; # matches. like expected. but [\n]* is like .*
        /s or not /s doesn't have to do something with this, or at least it shouldn't, i think.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://616558]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-19 18:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found