Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Strange regex to test for newlines: /.*\z/

by Anonymous Monk
on May 21, 2007 at 12:45 UTC ( #616545=note: print w/ replies, xml ) Need Help??


in reply to Strange regex to test for newlines: /.*\z/

I think that /.*\z/ should match any string indeed, and that the regex engine has a bug here.


Comment on Re: Strange regex to test for newlines: /.*\z/
Re^2: Strange regex to test for newlines: /.*\z/
by moritz (Cardinal) on May 21, 2007 at 13:21 UTC
      In r31303 of bleadperl this bug is fixed:
      $ perl5.9.5 -E 'say "match" if "f\n" ~~ /.*\z/'
      match
      

        Just FYI: you can do links to the Archive of Perl Changes using the apc:// linktype.

        ---
        $world=~s/war/peace/g

Re^2: Strange regex to test for newlines: /.*\z/
by Ojosh!ro (Beadle) on May 21, 2007 at 13:47 UTC
    I don't think it's a bug.

    When the match is in /m mode .* will match anything BUT a newline. ( when in /s mode .* will match anything )
    I assume what it is trying to match is one line.

    So basically what this test does is :
    "Between all characters (on this line) that are not newlines, and the end of the string, are there any other characters?", if so, it won't match. If it doesn't match, the only character that can cause it is a newline.
    It does sound a bit like a roundabout way to get what you want though.
    How about if ( $foo !~ /\n\z/ )

    BTW. setting $/ has no influence on /m or /s whatsoever?
    Not that I could find with experimentation.

    if( exists $aeons{strange} ){ die $death unless ( $death%2 ) }
      .* will match anything but a newline, or the empty string.

      So I'd expect "foo\n" =~ /.*\z/; to match, but capture the empty string in $&, not "foo\n".

      Of course there are more elaborate ways to match for a newline character ;-)

        .* will match anything but a newline, or the empty string.

        It will match both of those, actually, as it should.

        506 $ perl -we'print "yes" if "" =~ /.*/' yes 507 $ perl -we'print "yes" if "\n" =~ /.*/' yes
      One problem tho, the following all match the string "\n":
      /.*/ /\z/ /.{0}\z/
      It's possible that \z is meant to introduce some specialness when combined with .* (or possibly some other quantifiers), but I haven't seen it mentioned in any docs. This is either a bug, or a very poorly documented feature.
      According to your reasoning, the first of the following one-liners shouldn't print anything either:
      $ perl -lwe 'print "match" if "foo\n" =~ /[^\n]*\z/' match $ perl -lwe 'print "match" if "foo\n" =~ /.*\z/'
Re^2: Strange regex to test for newlines: /.*\z/
by xicheng (Sexton) on May 21, 2007 at 15:45 UTC
    No, it's not a bug. check carefully what's the difference between \z and \Z. and check the following samples:
    perl -e 'print "match\n" if "foo\n" =~ /.*\z/' perl -e 'print "match\n" if "foo\n" =~ /.*\Z/' perl -e 'print "match\n" if "foo\n\n\n" =~ /.*\Z/'
    Update: the third one matches just coz of .* in use. \Z can not keep multiple newlines.

    Regards,
    Xicheng
      Fair enough, but try:
      perl -e 'print "match\n" if "foo\n" =~ /.{0,}\z/'
      AFAIK, .* and .{0,} should be exactly equivilent, but when combined with /z they are not, if the string ends in a newline.

      There definitely appears to be a bug here, but it may be that the above snippet should not match, rather than the version with .* matching.
        hmm, Just notice that, thanks..

        I think, .* and .{0,} at the beginning of a regex pattern shold have been treated as optional, so that /.*A/ and /.{0,}A/ should be the same as /A/ which means .* and .{0,} are completely unnecessary in the above patterns..

        But \z looks behave very differently to .* and .{0,} as you mentioned.

        This looks like a Perl-related problem, PHP(use a similar regex engine) does it pretty well:
        php -r ' $str = "foo\n"; if (preg_match("/.*\z/", $str)) { print "match\n"; } ' match
        Probably it's a bug, and I am waiting for someone to make it clear. :-)

        Regards,
        Xicheng
      Indeed. Quoting and a bit paraphrasing "Mastering Regular Expressions 2nd Edition":
      A match mode can change the meaning of "$" to match before any embedde +d newline (or Unicode line terminator as well). When supported, "\Z" +usually matches what the "unmoded" "$" matches, which often means to +match at the end of the string, or before a string-ending newline. To + complement these, "\z" matches only at the end of the string, period +, without regard to any newline. .. //s stands for Single Line Mode which makes the dot match any characte +r. .. //m stands for Multi Line Mode which changes how ^& $ are considered b +y the regex engine. ^ is then begin of 1 line out of the many lines i +n the string and not begin of string and $ is end of 1 line out of th +e many lines in the string and not end of string. .. Caret "^" matches at the beginning of the text being searched, and, if + in an enhanced line-anchor match mode after any newline. .. \A always matches only at the start of the text being searched, regard +less of single or multi line match mode. .. "\Z" matches what the "unmoded" "$" matches, which means to match at t +he end of the string, or before a string-ending newline. To complemen +t these, "\z" matches only at the end of the string, period, without +regard to any newline.
      With thanks to Jeffrey Friedl's Regex Holy Book! ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://616545]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (16)
As of 2014-12-18 15:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (58 votes), past polls