newline behavior in Regular Expression

asinghvi has asked for the wisdom of the Perl Monks concerning the following question:

Perl gurus,
My Regular Expression String is

$PAT = '$X =~ s{^(.*?)Resolved (\d+) problems out of (\d+) picked(.*?)
+$}{ (($3-$2)/$3>.5) ? 1 : 0 }se';
[download]

where

$X = "Line1\nLine2Resolved 200 problems out of 5000 picked\nLine4\nLin
+e5";
[download]

When I do an eval($PAT); $X correctly gets modified to 1 or 0 depending on the digits in the string. But there is also a newline character appended to the '0' or '1'. What am I doing wrong here. I thought using 's' would let the (.*?) at the beginning and end glob all the newline characters.

Thanks for your help

Comment on newline behavior in Regular Expression Select or Download Code

Replies are listed 'Best First'.
Re: newline behavior in Regular Expression by fizbin (Chaplain) on Apr 12, 2004 at 19:06 UTC
Short answer: the `s` option does less than you think it does. Longer answer: in a perl regular expression match, without the `m` option being applied to the match, a `$` will match both the end of the string and, if the string ends in a newline, will also match right before the final newline. The `s` option does not change the behavior of `$`. It does, however, change the behavior of `.` so that `.` will match any character, up to and including newline. But you did `(.?)`. That is, you made the `.` match non-greedy, meaning that it took as few characters as possible, so it took everything before but not including the final newline, and then `$` matched right before the final newline character. The solution is to not make the second match non-greedy (i.e. use `(.)` and not `(.?)`), or to use something that really means just the end of the line and nothing else - that is, use `\z` instead of `$`. All of this is fairly clearly explained by doing `perldoc perlre`, q.v.	[reply]
Re: newline behavior in Regular Expression by cyocum (Curate) on Apr 12, 2004 at 21:00 UTC
I may be out of line with this but why are you using eval when it seems to me that you could have done this without it? Something along the lines of: `if($X ~= s{^(.?)Resolved (\d+) problems out of (\d+) picked(.?) +$}) { return ($3-$2/$3>.5) ? 1 :0); }` [download] Of course, I could be completely off the mark.	[reply] [d/l]
Re: Re: newline behavior in Regular Expression by qq (Hermit) on Apr 12, 2004 at 21:15 UTC
I agree, eval seems unneeded here. You can still use `s///`, although I would prefer it like yours. `#!/usr/bin/perl my $x = "Line1\nLine2Resolved 200 problems out of 5000 picked\nLine4\n +Line5"; $x =~ s/.?Resolved (\d+) problems out of (\d+) picked./ ($2 - ($1\/$2)) > .5 ? 1 : 0 /es; print "$x\n";` [download] Remember to escape the division in the substitution. qq	[reply] [d/l] [select]
Re: newline behavior in Regular Expression by halley (Prior) on Apr 12, 2004 at 18:54 UTC
Congrats. I didn't even know that IE would honor `<font size="+10">` but you proved it. I should print this question out as a poster for my bare wall here. The `$` means end-of-line, not end-of-string. The end of line is true just before any trailing newline character. So if $X had a newline before, it won't replace the newline after. One preventative measure would be to use `chomp($X)` beforehand. -- `[ e d @ h a l l e y . c c ]`	[reply] [d/l] [select]
Re: Re: newline behavior in Regular Expression by bart (Canon) on Apr 12, 2004 at 19:22 UTC
The $ means end-of-line, not end-of-string Nope. That's only the case with the option `/m` enabled. Unless I misunderstand what you mean...? Which is not exactly very clear to me. I think the OP's problem is that he used `/PAT(.?)$/` and expected to match the final newline as well. As the /$/ does match just before the final* newline as well as at the very end of the string, the non-greediness makes it pick the shortest match, thereby leaving the newline alone. Drop the '?' to make it work.	[reply] [d/l]
Re: newline behavior in Regular Expression by Anonymous Monk on Apr 12, 2004 at 18:55 UTC
Works fine for me. The part of your regex that actually excludes the newline(s) is the ^ and $ anchors, not the .*?. But what you have works fine. $X correctly winds up being 0 or 1.	[reply]

Back to Seekers of Perl Wisdom