Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^2: Strange regex to test for newlines: /.*\z/

by xicheng (Sexton)
on May 21, 2007 at 15:45 UTC ( #616593=note: print w/ replies, xml ) Need Help??


in reply to Re: Strange regex to test for newlines: /.*\z/
in thread Strange regex to test for newlines: /.*\z/

No, it's not a bug. check carefully what's the difference between \z and \Z. and check the following samples:

perl -e 'print "match\n" if "foo\n" =~ /.*\z/' perl -e 'print "match\n" if "foo\n" =~ /.*\Z/' perl -e 'print "match\n" if "foo\n\n\n" =~ /.*\Z/'
Update: the third one matches just coz of .* in use. \Z can not keep multiple newlines.

Regards,
Xicheng


Comment on Re^2: Strange regex to test for newlines: /.*\z/
Download Code
Re^3: Strange regex to test for newlines: /.*\z/
by Mutant (Priest) on May 21, 2007 at 15:55 UTC
    Fair enough, but try:
    perl -e 'print "match\n" if "foo\n" =~ /.{0,}\z/'
    AFAIK, .* and .{0,} should be exactly equivilent, but when combined with /z they are not, if the string ends in a newline.

    There definitely appears to be a bug here, but it may be that the above snippet should not match, rather than the version with .* matching.
      hmm, Just notice that, thanks..

      I think, .* and .{0,} at the beginning of a regex pattern shold have been treated as optional, so that /.*A/ and /.{0,}A/ should be the same as /A/ which means .* and .{0,} are completely unnecessary in the above patterns..

      But \z looks behave very differently to .* and .{0,} as you mentioned.

      This looks like a Perl-related problem, PHP(use a similar regex engine) does it pretty well:
      php -r ' $str = "foo\n"; if (preg_match("/.*\z/", $str)) { print "match\n"; } ' match
      Probably it's a bug, and I am waiting for someone to make it clear. :-)

      Regards,
      Xicheng
Re^3: Strange regex to test for newlines: /.*\z/
by ddn123456 (Pilgrim) on May 22, 2007 at 07:57 UTC
    Indeed. Quoting and a bit paraphrasing "Mastering Regular Expressions 2nd Edition":
    A match mode can change the meaning of "$" to match before any embedde +d newline (or Unicode line terminator as well). When supported, "\Z" +usually matches what the "unmoded" "$" matches, which often means to +match at the end of the string, or before a string-ending newline. To + complement these, "\z" matches only at the end of the string, period +, without regard to any newline. .. //s stands for Single Line Mode which makes the dot match any characte +r. .. //m stands for Multi Line Mode which changes how ^& $ are considered b +y the regex engine. ^ is then begin of 1 line out of the many lines i +n the string and not begin of string and $ is end of 1 line out of th +e many lines in the string and not end of string. .. Caret "^" matches at the beginning of the text being searched, and, if + in an enhanced line-anchor match mode after any newline. .. \A always matches only at the start of the text being searched, regard +less of single or multi line match mode. .. "\Z" matches what the "unmoded" "$" matches, which means to match at t +he end of the string, or before a string-ending newline. To complemen +t these, "\z" matches only at the end of the string, period, without +regard to any newline.
    With thanks to Jeffrey Friedl's Regex Holy Book! ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://616593]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (9)
As of 2015-07-07 13:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (88 votes), past polls