http://www.perlmonks.org?node_id=1233651


in reply to How to match last character of string, even if it happens to be a newline?

My regex Best Practices (lifted whole from TheDamian's Perl Best Practices — highly recommended in general) include using an  /xms modifier tail on every  qr// m// s/// I write. This reduces the degrees of freedom of the  ^ $ . operators and clarifies their function, at least for me. Coupled with the use of  \A \z \Z as string start/end anchors, I find I can think a bit more clearly about the highly counterintuitive operation of regular expressions.


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: How to match last character of string, even if it happens to be a newline?
by Allasso (Monk) on May 12, 2019 at 15:43 UTC
    This reduces the degrees of freedom of the ^ $ . operators
    Maybe that hints at the following inconsistency I see with the "the end of the line (or before newline at the end)" rule?

    Example

    print('String: a\nb\nc\n' . "\n"); $text_1 = "a\nb\nc\n"; $text_1 =~ s@(\n)$@@s; print("----------\n>" . $1 . "<\n"); print("----------\n>" . $text_1 . "<\n");
    Gives:
    String: a\nb\nc\n ---------- > < ---------- >a b c<
    In this case, the $ behaved like \z. Or another way to say it, in this case explicit \n matches where dot with s mod doesn't.
      ... in this case explicit \n matches where dot with s mod doesn't.

      That's because  \n$ requires a match with newline, but  .$ allows the leftmost position of a match with dot (with /s) to be before the newline. Dot will match newline in the presence of  $ if it is the only match available:

      c:\@Work\Perl\monks>perl -wMstrict -le "use Data::Dump qw(pp); ;; for my $s (qq{yz}, qq{yz\n}, qq{\n}) { $s =~ m{ (.) $ }xms; printf qq{in %s matched %s \n}, pp($s), pp($1); } " in "yz" matched "z" in "yz\n" matched "z" in "\n" matched "\n"
      (The  /m modifier makes no difference in these example strings.)

      The thing to remember about regular expressions is that there are a lot of things to remember about regular expressions. If you have a chance to reduce the amount of stuff to remember, even if only by a little, take it. That's why I advise (per TheDamian's regex PBPs) using  \A \z \Z for all your start- and end-of-string anchoring needs, and using  ^ $ only for embedded newline matching.

      ... inconsistency ...

      For me, it's not so much inconsistency as mind-boggling complexity. And again, I come back to the point that if you can reduce the complexity of what you're dealing with even a little, you're ahead of the game.


      Give a man a fish:  <%-{-{-{-<