http://www.perlmonks.org?node_id=890769

dannyd has asked for the wisdom of the Perl Monks concerning the following question:

Greeting o' Wise ones,

Ive tried to come up with some questions about new lines. I don't understand any of this now, so any information will be chomped down voraciously.

1.How can the difference between these characters be seen in the context of command issuing or taking input from <STDIN>?(Can you give me some exercies to do, so I can get to see them in action?)

2.Is CR the newline character for windows? and LF the newline character for linux?

3.Is CR == "\r\n" and LF == "\n" in perl?

4.What about Control-M, that appears at the end of windows generated files before dos2unixing them, is that character just "\r"?

5.In the vim editor when I open a file and type ':set list', i see a '$' at the end of each line, what character is that?

6.I assumed that the "\n" is an LF ascii map is "\r" a CR ascii map?

7.When I used expect.pm to get some information via an ssh server hosted on a appliance, a lot of the lines returned were terminated by "\r\n", what character is that?

8.Is there a difference between a 'line terminator' in the terminal and in a file?

Can someone please help.

Replies are listed 'Best First'.
Re: Carrige Return and Line Feed in Perl.
by Eliya (Vicar) on Mar 01, 2011 at 17:10 UTC
    2.Is CR the newline character for windows? and LF the newline character for linux?
    3.Is CR == "\r\n" and LF == "\n" in perl?

    \r = CR = carriage return = ASCII code 13 (decimal), 015 (octal), 0d (hex)
    \n = LF = line feed = ASCII code 10 (decimal), 012 (octal), 0a (hex)

    On Windows, the combination of those two control characters, i.e. \r\n, is used to indicate a newline, while on Linux/Unix, a single \n is used as newline.

    (To simplify things, this ignores old-style Mac semantics — see the already mentioned link for details.)

    4.What about Control-M, that appears at the end of windows generated files before dos2unixing them, is that character just "\r"?

    See Caret Notation.  More specifically, Control-M, or ^M, is the same as \r, because \r is ASCII code 13, and M is the 13th character in the alphabet.

    6.I assumed that the "\n" is an LF ascii map is "\r" a CR ascii map?

    Not sure what you mean by that.

    7.When I used expect.pm to get some information via an ssh server hosted on a appliance, a lot of the lines returned were terminated by "\r\n", what character is that?

    Generally, \r\n is a Windows newline (see above), but it's also used in some network protocols to indicate newline (aiming to be portable).

    8.Is there a difference between a 'line terminator' in the terminal and in a file?

    No.

    That said, as long as you do not operate across platforms, you usually need not worry about line terminator differences, because the PerlIO layer ":crlf" automatically converts newlines to and from Perl's internally used \n when reading and writing from/to file handles — in case a conversion is required, such as on Windows.  I.e. (on Windows, by default), \r\n is translated to \n on input, and on output, \n is translated to \r\n.  But if you need to, you can change that behavior with binmode or open.  For example, to transparently read/write Windows files on Unix, you can push the :crlf layer on the respective file handle's layer stack.

      All questions answered in 3 posts ^^ wow..didn't expect that!

      I guess a lot of them were repetitions, but to my mind before the posts they were not...so its still a wow...

      Thanks toolic, moritz and Eliya for all the information, some of this stuff has been plaguing me for a while now.

      Note to self : Start chomping away, and search perldocs first before posting :)

        Also a bit of history: Wiki CRLF history. This is extra info - some folks find it interesting.

        Many moons ago I worked with these ASR33 teletypes. In order to see one now, you'd have to go to a museum or watch an old black and white film. These things used paper tape which had lubricating oil embedded in the tape. On the keyboard, on the right, there were 3 keys arranged together: Carriage return (CR), Line Feed (LF) and rubout (NULL) (not just a single "return" or "enter" button). The paper tape was 8 bits wide, a NULL meant to punch holes in all 8 positions. The normal sequence to end a line was: CR, LF, Rubout.

        I'm not sure that I buy all of the explanation in the wiki article. The ASR33 was a very stupid thing and I think part of the reason was to just have a separate key for each mechanical action (eg separate CR from LF). This also allowed low resolution pictures of a sort to be printed by repeatedly printing over the same line (no LF). I saw many pictures printed like this so, the Wiki theory that LF was needed as time delay for the CR mechanics is doubtful - there was hardware flow control. The lubrication for the mechanical fingers came from the tape itself, so the periodic Rubout after every line served to keep all of the fingers well lubricated. A series of rubouts served to indicate the preamble and postamble to a tape. Nasty stuff as all the folks dealing with these oil soaked tapes wound up with grease stains in their shirt pockets! We viewed optical tape readers as a major technological advance! No more oil in the tape and therefore no more grease stains in the shirt pockets!

        On a more practical side of things, I exchange programs from my Windows box to Unix and vice-versa. Last month I wrote 2 Perl files on Unix with vi and one Perl file on Windows. Perl on either machine didn't care and the program ran on both machines (Perl doesn't care for its own program). I haven't had trouble with exchanging data files when Perl is doing the reading.

        I use TextPad (a shareware program) for my Windows program editor and it doesn't care either. The trouble happens when you are exchanging data with something like Windows NotePad which does care (it cannot deal with just LF as a "newline" character). However a simple Perl program that chomps and then re-writes the line with a "\n" seems to set things right (results in \r\n when run on Windows and just \n when run on Unix).

        So by and large, I figure that Perl did about as well as can be done with this mess.

        If you are say hashing to disk with a fixed record size in bytes, this is certainly a consideration, but I've written multi-platform programs that work fine. This is a detail that matters, but there are solutions. But for the most part, this is not important at all, except when the output file is going to be used by a non-Perl application that can't deal with different line endings.

Re: Carrige Return and Line Feed in Perl.
by toolic (Bishop) on Mar 01, 2011 at 15:19 UTC
    will be chomped
    Funny you should mention chomp :)

    By no means will this answer all your questions, but here is a reference from the official docs: Newlines

Re: Carrige Return and Line Feed in Perl.
by moritz (Cardinal) on Mar 01, 2011 at 15:56 UTC
    5.In the vim editor when I open a file and type ':set list', i see a '$' at the end of each line, what character is that?
    From :help list:
    Same as :print, but display unprintable characters with '^' and put $ after the line. See |ex-flags| for flags.

    So the $ is just an indicator for whatever vim currently consideres a newline (which is determined with :set fileformat).

Re: Carrige Return and Line Feed in Perl.
by sundialsvc4 (Abbot) on Mar 01, 2011 at 21:32 UTC

    That is why perlmonks.org exists, and that is why so many of us spend a portion of our day trying to contribute usefully to it.   Never feel embarrassed for having asked a well-phrased question.   Every single one of your questions make perfect sense, every one of them is common, and every one of them are an absolute show-stopper until you know the answer.

    “Now you know.   And furthermore, for having read this exchange, now an unknown number of other people (across time) will also know.”   That’s the power of the Internet, and of well-known sites like these.

      You wrote: “Now you know. And furthermore, for having read this exchange, now an unknown number of other people (across time) will also know.” Well, it's now 2016 and plain old \n wasn't working for me so I searched and found this thread. Thank you from the future!! Steve
Re: Carrige Return and Line Feed in Perl.
by Argel (Prior) on Mar 01, 2011 at 23:31 UTC
    Slightly off-topic, but I got fed up with chomp because I work on files that were generated on e.g. UNIX and then transferred to Windows or vice versa (thanks vendors). These days instead of chomp I use something like $line =~ s/(?:\012)|(?:\015)//g; to get rid CRs and LFs when reading files in.

    Update: BTW, nice series of questions and welcome to PM!!

    Elda Taluta; Sarks Sark; Ark Arks

      s/\s+$//

      Much better than chomp. Trailing whitespace should never be significant, so don't let it be.

      - tye        

        Will be trying/reading all the suggestions/links.

        Things are a lot clearer now, thank you all for the insight.

        That's a very good point. If it did matter it would be a rare/edge case, so just getting rid of the extra whitespace by default makes sense!!

        Elda Taluta; Sarks Sark; Ark Arks

Re: Carrige Return and Line Feed in Perl.
by chrisjej (Initiate) on Mar 19, 2014 at 12:57 UTC

    If you want to do this from Windows (rather than Unix), it seems to be very hard to stop perl wanting to convert anything that looks like ascii 10 back to ascii 10 + ascii 13.

    The one way I've managed to do it is to put the file handle into binmode. aka:

    binmode STDOUT; while (<>) { s/\n//; print "$_" . chr(10); }

    Possibly there is a variable that controls this but I haven't found it and things like $OFS in the program and -l012 on the command line don't seem to help (in perl 5.16). Possibly someone might to look into this in more detail.