Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

How to match last character of string, even if it happens to be a newline?

by Allasso (Monk)
on May 12, 2019 at 14:36 UTC ( #1233644=perlquestion: print w/replies, xml ) Need Help??

Allasso has asked for the wisdom of the Perl Monks concerning the following question:

I'm having difficulty matching the last character in a string if it happens to be a \n. I would have naively thought m@(.)$@s would work, but instead if the last character is \n it matches the character preceding. If I explicitly use \n it will match, but I can't seem to do it conditionally. I tried using an OR, but behaves the same as using the . (dot). I tried using m flag (without g), thinking perl starts evaluation at the end of the string and I might get it that way.

Here is what I've tried:

print('String: a\nb\nc\n' . "\n"); my $text_1 = "a\nb\nc\n"; $text_1 =~ m@(.)$@s; print("----------\n>" . $1 . "<\n"); $text_1 =~ m@(\n)$@s; print("----------\n>" . $1 . "<\n"); $text_1 =~ m@(\n|.)$@s; print("----------\n>" . $1 . "<\n"); print("\n" . 'String: a\nb\nc\n - multiline match:' . "\n"); $text_1 = "a\nb\nc\n"; $text_1 =~ m@(.)$@sm; print("----------\n>" . $1 . "<\n"); $text_1 =~ m@(\n)$@sm; print("----------\n>" . $1 . "<\n"); $text_1 =~ m@(\n|.)$@sm; print("----------\n>" . $1 . "<\n");
Here's what I get:
String: a\nb\nc\n ---------- >c< ---------- > < ---------- >c< String: a\nb\nc\n - multiline match: ---------- >a< ---------- > < ---------- >a<
My searching on this issue has not come up with any answers.

Help?

Replies are listed 'Best First'.
Re: How to match last character of string, even if it happens to be a newline?
by AnomalousMonk (Chancellor) on May 12, 2019 at 14:56 UTC

    By default, the  . (dot) regex metacharacter matches everything except a newline. Use the  /s modifier to make dot match everything.

    c:\@Work\Perl\monks>perl -wMstrict -le "use Data::Dump qw(pp); ;; for my $s (qq{yz}, qq{yz\n}) { $s =~ m{ (.) \z }xms; printf qq{in %s matched %s \n}, pp($s), pp($1); } " in "yz" matched "z" in "yz\n" matched "\n"
    See also  \z for "absolute end of string" anchor.

    Update: You're also running into an interaction with  $ which matches | which by default matches at "the end of the line (or before newline at the end)", so even with the  /s modifier, the first position at which  .$ can possibly match (scanning from left to right) is before a newline, if present; remember that the matching rule is leftmost longest.


    Give a man a fish:  <%-{-{-{-<

      Yes, the \z anchor does, the trick, thanks!
      Using \z, I don't seem to need multiline, simply m@(.)\z@s
        Using \z, I don't seem to need multiline ...

        That's because  \z is always the absolute end-of-string anchor; no modifiers apply. I always use  \A \z \Z because they have invariant behavior. For the same reason, I nail down the  ^ $ operators by always using the  /m modifier. (I then use the  ^ $ operators only with newlines embedded within a string.)


        Give a man a fish:  <%-{-{-{-<

      er... I _did_ use s modifier :-/
Re: How to match last character of string, even if it happens to be a newline?
by jwkrahn (Monsignor) on May 12, 2019 at 18:48 UTC

    You don't really need a regular expression just to get the last character in a string:

    $ perl -e'use Data::Dumper; $Data::Dumper::Useqq = 1; my $text_1 = "a\ +nb\nc\n"; print Dumper $text_1, "Last character: " . substr $text_1, +-1' $VAR1 = "a\nb\nc\n"; $VAR2 = "Last character: \n";
      On a side note :

      chop does the same job but is destructive and requires an lvalue.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      indeed!
Re: How to match last character of string, even if it happens to be a newline?
by AnomalousMonk (Chancellor) on May 12, 2019 at 15:27 UTC

    My regex Best Practices (lifted whole from TheDamian's Perl Best Practices — highly recommended in general) include using an  /xms modifier tail on every  qr// m// s/// I write. This reduces the degrees of freedom of the  ^ $ . operators and clarifies their function, at least for me. Coupled with the use of  \A \z \Z as string start/end anchors, I find I can think a bit more clearly about the highly counterintuitive operation of regular expressions.


    Give a man a fish:  <%-{-{-{-<

      This reduces the degrees of freedom of the ^ $ . operators
      Maybe that hints at the following inconsistency I see with the "the end of the line (or before newline at the end)" rule?

      Example

      print('String: a\nb\nc\n' . "\n"); $text_1 = "a\nb\nc\n"; $text_1 =~ s@(\n)$@@s; print("----------\n>" . $1 . "<\n"); print("----------\n>" . $text_1 . "<\n");
      Gives:
      String: a\nb\nc\n ---------- > < ---------- >a b c<
      In this case, the $ behaved like \z. Or another way to say it, in this case explicit \n matches where dot with s mod doesn't.
        ... in this case explicit \n matches where dot with s mod doesn't.

        That's because  \n$ requires a match with newline, but  .$ allows the leftmost position of a match with dot (with /s) to be before the newline. Dot will match newline in the presence of  $ if it is the only match available:

        c:\@Work\Perl\monks>perl -wMstrict -le "use Data::Dump qw(pp); ;; for my $s (qq{yz}, qq{yz\n}, qq{\n}) { $s =~ m{ (.) $ }xms; printf qq{in %s matched %s \n}, pp($s), pp($1); } " in "yz" matched "z" in "yz\n" matched "z" in "\n" matched "\n"
        (The  /m modifier makes no difference in these example strings.)

        The thing to remember about regular expressions is that there are a lot of things to remember about regular expressions. If you have a chance to reduce the amount of stuff to remember, even if only by a little, take it. That's why I advise (per TheDamian's regex PBPs) using  \A \z \Z for all your start- and end-of-string anchoring needs, and using  ^ $ only for embedded newline matching.

        ... inconsistency ...

        For me, it's not so much inconsistency as mind-boggling complexity. And again, I come back to the point that if you can reduce the complexity of what you're dealing with even a little, you're ahead of the game.


        Give a man a fish:  <%-{-{-{-<

Re: How to match last character of string, even if it happens to be a newline?
by LanX (Archbishop) on May 12, 2019 at 15:27 UTC
    That's what you want? :)
    $ perl -e' m/.*(.|\n)/,print "<$1>" for "123","ab\ c\n"' <3>< >$

    Please note that \n is often not one but two characters, like on Unix CR LF

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      Mac or Windows newlines seldom cause a problem. I think of \n as a perl newline. Perl strings always use it. Translation between it and your OS's representation is done by an I/O "layer". (In Unix, the "translation" does not actually change anything.) The only exception is when we change I/O behavior by specifying non-standard layers or binmode on input.
      Bill
        I didn't say it's a problem in general,

        I said it's not always just one character like the OP suggested.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      Not quite:
      $text_1 = "abc\nd"; $text_1 =~ m/.*(.|\n)/; print("----------\n>" . $1 . "<\n");
      Prints:
      ---------- > <
      Should print d
        $text_1 = "abc\nd";
        $text_1 =~ m/.*(.|\n)/;
        ...
        Should print d

        A narration of  m/.*(.|\n)/ might be:

        1. .*     From the start of the string, grab as much as possible of anything that's not a newline (no /s modifier for dot);
        2. (.|\n) Then match and capture the first thing that's either not-a-newline or a newline.
        Looked at this way, the only thing that could possibly be captured in the given string would be a newline.

        Indeed, if your regex has no operator introduced after Perl version 5.6, this kind of narration is what YAPE::Regex::Explain will give you:

        c:\@Work\Perl\monks>perl -wMstrict -le "use YAPE::Regex::Explain; print YAPE::Regex::Explain->new(qr/.*(.|\n)/)->explain(); " The regular expression: (?-imsx:.*(.|\n)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \n '\n' (newline) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
        (There are newer and better regex parser/explainers around, but I like this one, limited as it is, for its explanatory style.)


        Give a man a fish:  <%-{-{-{-<

        > Not quite: ... "abc\nd"

        in this case two options with /s modifier

        DB<11> m/.*(.|\n)/s,print "<$1>" for "123","abc\n","abc\nd" <3>< ><d> DB<12> m/.*(.)/s,print "<$1>" for "123","abc\n","abc\nd" <3>< ><d> DB<13>

        HTH!

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1233644]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2019-05-25 05:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you enjoy 3D movies?



    Results (151 votes). Check out past polls.

    Notices?
    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!