ELISHEVA has asked for the wisdom of the Perl Monks concerning the following question:
I'm confused about when a Perl regex needs a full match on a string.
Recently in a post somebody suggested that '$' matched both the end of a string and a newline. On the other hand the Perl docs (http://perldoc.perl.org/perlre.html#Modifiers) suggest that '$' matches the boundary created by the new line/end of file/string/stream rather than the actual thing that created the boundary, i.e. it does not consume the thing that created the boundary. This would mean that '^a$' should be a partial match on "a\n" and '^a$\n' should be a full match.
To test this hypothesis I wrote up a small script comparing the results of matching "a\n" and "a\n\n" with three different regexs: /a$/, /a$\n/, and a$\z:
#Note: To keep Perl from resolving "$\n" as the variable "$\" #followed by the letter "n", this code sample constructs regexen #using non-interpolating quotes. use strict; use warnings; my @aRegexTests =( ["a\n", '^a$', '$ matches boundary, maybe more?'] , ["a\n", '^a$\n' , '$ matches only boundary, \n matches newline' ] , ["a\n", '^a$\z' , '$ matches only boundary, \z fails because of newline?' ] , ["a\n\n", '^a$' , '$ matches only boundary, \n matches first newline' ] , ["a\n\n", '^a$\n' , '$ matches only boundary, \n matches first newline?' ] ); foreach (@aRegexTests) { my ($sString, $sRegex, $sComment) = @$_; my $sMatch = ($sString =~ /$sRegex/) ? "match" : "no match"; my $sPrint = $sString; $sPrint =~ s/\n/\\n/g; print "string=<$sPrint>\n"; print " no modifier: " . "regex=/$sRegex/\n $sMatch => $sComment\n"; $sMatch = ($sString =~ /$sRegex/s) ? "match" : "no match"; print " s modifier (single line mode): " ."regex=/$sRegex/s\n $sMatch => $sComment\n"; $sMatch = ($sString =~ /$sRegex/m) ? "match" : "no match"; print " m modifier (multi line mode): " ."regex=/$sRegex/m\n $sMatch => $sComment\n"; }
which outputs
string=<a\n> no modifier: regex=/^a$/ match => $ matches boundary, maybe more? s modifier (single line mode): regex=/^a$/s match => $ matches boundary, maybe more? m modifier (multi line mode): regex=/^a$/m match => $ matches boundary, maybe more? string=<a\n> no modifier: regex=/^a$\n/ match => $ matches only boundary, \n matches newline s modifier (single line mode): regex=/^a$\n/s match => $ matches only boundary, \n matches newline m modifier (multi line mode): regex=/^a$\n/m match => $ matches only boundary, \n matches newline string=<a\n> no modifier: regex=/^a$\z/ no match => $ matches only boundary, \z fails because of newline? s modifier (single line mode): regex=/^a$\z/s no match => $ matches only boundary, \z fails because of newline? m modifier (multi line mode): regex=/^a$\z/m no match => $ matches only boundary, \z fails because of newline? string=<a\n\n> no modifier: regex=/^a$/ no match => $ matches only boundary, \n matches first newline s modifier (single line mode): regex=/^a$/s no match => $ matches only boundary, \n matches first newline m modifier (multi line mode): regex=/^a$/m match => $ matches only boundary, \n matches first newline string=<a\n\n> no modifier: regex=/^a$\n/ no match => $ matches only boundary, \n matches first newline s modifier (single line mode): regex=/^a$\n/s no match => $ matches only boundary, \n matches first newline m modifier (multi line mode): regex=/^a$\n/m match => $ matches only boundary, \n matches first newline
It would appear that my original question (is /^a$/ a partial match?) was answered in the affirmative, but it was quickly replaced by another: why do the regexes /^a$/ and /^a$\n/ match "a\n\n" in only the multi-line mode? They match "a\n" (only one \n) in all three modes. The regex doesn't end in "\z" so why does it care that the second "\n" is unmatched? Surely I am misunderstanding something?
Thanks in advance, beth
Update 1: Fixed various typos
Update 2: I'm wondering if maybe the absence of the m modifier means only 0 or 1 new lines allowed? - [addendum 2009.02.08 - The answer to this is a resounding no - see post below by jethro for citation from perl docs and here for test examples.]
Update 3: Added comment to code explaining how above script keeps Perl from thinking "$\n" is the variable "$\" followed by the letter "n". My apologies for any confusion the absence of this comment caused.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: When exactly do Perl regex's require a full match on a string?
by jethro (Monsignor) on Feb 08, 2009 at 14:34 UTC | |
Re: When exactly do Perl regex's require a full match on a string?
by gone2015 (Deacon) on Feb 08, 2009 at 14:56 UTC | |
by AnomalousMonk (Archbishop) on Feb 09, 2009 at 05:53 UTC | |
Re: When exactly do Perl regex's require a full match on a string?
by jwkrahn (Abbot) on Feb 08, 2009 at 15:54 UTC | |
by ELISHEVA (Prior) on Feb 08, 2009 at 17:35 UTC | |
by jethro (Monsignor) on Feb 09, 2009 at 01:22 UTC | |
by AnomalousMonk (Archbishop) on Feb 09, 2009 at 04:02 UTC | |
by jwkrahn (Abbot) on Feb 09, 2009 at 11:00 UTC |