http://www.perlmonks.org?node_id=420989

Joe_Cullity has asked for the wisdom of the Perl Monks concerning the following question:

Actually I have 3 questions, 2 technically related, and one on how to be NICE on this message system

First a quick history lesson:
Once a long long time ago (You guys guess my age) a newline “\n” was considered a LineFeed (0x0A). Then Selectric typewriters, and soon after dot matrices printers appeared which also needed a command to send the print ball/head to location zero, so some systems redefined the newline to be a LineFeed followed by a CarriageReturn ( 0x0A + 0x0D ). Someone quickly figured out that it took a lot longer for the ball/print head to move to location zero, then it did to move the paper up one line, so they reversed the chars, issuing the CarriageReturn first and the LineFeed second (0x0D + 0x0A )… and finally someone decide to save space and just send a CarriageReturn char (0x0D) and assume the LineFeed. So now theirs a real mishmash of ideas describing what a newline should be.

The 1st Question:
What does Perl think a newline should be?

Dose Perl decide? Does it depend on the ‘C’ compiler used to create Perl? Maybe it depends on what the operating system thinks a newline should be? Maybe I can define it, so I can pick what a newline should look like.

The 2nd Question:
How does chomp() handle newlines.
Will chomp() remove all forms of newline, or only the exact char or char sequence specified by “\n” ?

The 3rd Question:
A while ago I posted a couple of question and quickly received good accurate answers that I put to use … and forgot about the question. I came back to the board weeks latter and found that people had continued to respond to my question long after I solved the problem. Is there a way to flag the message as already answered/solved? I hate seeing people waist their time solving my already answered questions ?

Thanks Joe_Cullity

Edited 2005-01-05 by Ovid

20050111 Edit by castaway: Changed title from '3 questions... 2 technically related, and one on how to be NICE'

  • Comment on 3 questions... 2 about newlines, and one on how to be NICE

Replies are listed 'Best First'.
Re: 3 questions... 2 about newlines, and one on how to be NICE
by EdwardG (Vicar) on Jan 10, 2005 at 17:00 UTC

    1. From perldoc perlvar -

    $/ The input record separator, newline by default. This influences Perl's idea of what a "line" is. Works like awk's RS variable, including treating empty lines as a terminator if set to the null string. (An empty line cannot contain any spaces or tabs.) You may set it to a multi-character string to match a multi-character terminator, or to "undef" to read through the end of file. Setting it to "\n\n" means something slightly different than setting to "", if the file contains consecutive empty lines. Setting to "" will treat two or more consecutive empty lines as a single empty line. Setting to "\n\n" will blindly assume that the next input character belongs to the next paragraph, even if it's a newline. (Mnemonic: / delimits line boundaries when quoting poetry.)

    2. chomp removes whatever is in $/

    3. Don't feel bad about people continuing to answer your question - others may benefit, and there is benefit for the answerer as well.

     

      While we're quoting from perldoc, here's part of perldoc perlop:
      All systems use the virtual "\n" to represent a line terminator, called a "newline". There is no such thing as an unvarying, physical newline character. It is only an illusion that the operating system, device drivers, C libraries, and Perl all conspire to preserve. Not all sys- tems read "\r" as ASCII CR and "\n" as ASCII LF. For example, on a Mac, these are reversed, and on systems without line terminator, print- ing "\n" may emit no actual data. In general, use "\n" when you mean a "newline" for your system, but use the literal ASCII when you need an exact character. For example, most networking protocols expect and prefer a CR+LF ("\015\012" or "\cM\cJ") for line terminators, and although they often accept just "\012", they seldom tolerate just "\015". If you get in the habit of using "\n" for networking, you may be burned some day.
      So in other words, use "\n" for your local system's view of a newline, or use actual characters if you know what you want, regardless of what system you're running on.
        Just to complicate things further, please note that this is out of date and somewhat inaccurate. When perldoc says "on a Mac," it means "on a Mac running the Macintosh operating system up through version 9." Mac OS X (version 10) and newer use Unix-style newlines.
      3. Don't feel bad about people continuing to answer your question - others may benefit, and there is benefit for the answerer as well.
      Yup. Just ask paco
Re: 3 questions... 2 about newlines, and one on how to be NICE
by BrowserUk (Patriarch) on Jan 10, 2005 at 17:05 UTC

    I think that EdwardG has pointed you at the answers 1) & 2).

    And 3) as well, but I'd like to add that some of the most interesting discusion here arises as an aside to the original question, not as a direct response to it.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re: 3 questions... 2 about newlines, and one on how to be NICE
by elwarren (Priest) on Jan 10, 2005 at 19:18 UTC
    If your problem has been solved, come back and reply to your post that it was solved and perhaps the solution you used. Without this we may not know if our advice actually worked for you.

    HTH
Re: 3 questions... 2 about newlines, and one on how to be NICE
by brian_d_foy (Abbot) on Jan 10, 2005 at 21:09 UTC

    There's a long section on this is the perlport manpage (perl portability).

    The "\n" is just a representation of an actual bit pattern, and that bit pattern my be different from system to system. It's a "logical newline", so it ends up being what that system needs as a newline.

    Some things that really care about the bit patterns (such as protocol modules) sometimes specify the exact bit sequence they want instead of relying on a logical representation.

    # from CGI.pm $EBCDIC = "\t" ne "\011"; if ($OS eq 'VMS') { $CRLF = "\n"; } elsif ($EBCDIC) { $CRLF= "\r\n"; } else { $CRLF = "\015\012"; }

    As for annotating the end of your problem, you could post a thank you with a summary of which answers helped you figure out the solution (or even which didn't). :)

    --
    brian d foy <bdfoy@cpan.org>

      Calling "\n" a "logical newline" like you have, and like some versions of perlport do, does more to confuse than to enlighten.

      That same logic would call "a" a "logical lower-case letter A" that actually represents a bit pattern that varies from place to place. So, sometimes, if you want a specific bit pattern, then you should hard-code the bit pattern and not use "a" directly (insert here code that sometimes sets $a to be a specific bit pattern, etc. and then uses "$h$e$l$l$o" instead of "hello" to be "portable").

      It sounds good and is even correct to some extent. But it leads to people writing less portable code and a lot of confusion.

      See Re^4: Line Feeds (rumor control) for much about this.

      The code you quote is 50% due to a design mistake in the C compiler on old Macs that got propogated into Perl on that platform. The other 50% is due to a quirk of the most common VMS-based web server.

      None of it is based on general principles of portability, which call for just using "\n" which is simply "the newline character in that system's character set". Needing to use something other than that as newline simply means that your system is either missing a translation layer it should have (old Macs) or has extra translation it shouldn't have (the most common VMS web server).

      The comments in that code are a bit misleading as well. A better version would be:

      if ($OS eq 'VMS') { # VMS web servers translate this to CR+LF $CRLF = "\n"; } elsif ("\r\n" eq "\012\015") { # Old Macs get CR and LF backward $CRLF= "\n\r"; } else { # This works on any sane system # whether EBCDIC or ASCII or other $CRLF= "\r\n"; }

      The "logcial newline" idea confuses translations of newlines in I/O that are common to C-based systems (including Unix) with the above Mac mistake. It confuses the newline character with the end-of-line sequence. It encourages bad practices that happen to work on Macs and ASCII systems.

      - tye        

        I don't see how "logical" is confusing if people understand that it means the object stands-in for something else. A lowercase "a" does not stand in for an uppercase "A", so I wouldn't call it a logical "A".

        The CGI.pm example just demostrates the hoops that Lincoln jumped through. Whether Mac Classic was right or wrong, it still was what it was. I didn't pick how these C libs were written or how these operating systems were designed. I'm not defending them. I just deal with it and get on with life.

        --
        brian d foy <bdfoy@cpan.org>
Re: 3 questions... 2 about newlines, and one on how to be NICE
by CountZero (Bishop) on Jan 10, 2005 at 20:03 UTC
    Rather than replying to your own post you could add an Update to your post (at the bottom of your post), so other Monks immediately see that it has been answered and do not have to read all the replies.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: 3 questions... 2 about newlines, and one on how to be NICE
by Mago (Parson) on Jan 10, 2005 at 20:43 UTC

    2.chomp

    # chomp VARIABLE
    # chomp( LIST )
    # chomp

    This safer version of "chop" removes any trailing string that corresponds to the current value of $/ (also known as $INPUT_RECORD_SEPARATOR in the English module). It returns the total number of characters removed from all its arguments. It's often used to remove the newline from the end of an input record when you're worried that the final record may be missing its newline. When in paragraph mode ($/ = ""), it removes all trailing newlines from the string. When in slurp mode ($/ = undef) or fixed-length record mode ($/ is a reference to an integer or the like, see perlvar) chomp() won't remove anything. If VARIABLE is omitted, it chomps $_. Example:

    while (<>) { chomp; # avoid \n on last field @array = split(/:/); # ... }

    If VARIABLE is a hash, it chomps the hash's values, but not its keys.

    You can actually chomp anything that's an lvalue, including an assignment:
    chomp($cwd = `pwd`); chomp($answer = <STDIN>);

    If you chomp a list, each element is chomped, and the total number of characters removed is returned.

    If the encoding pragma is in scope then the lengths returned are calculated from the length of $/ in Unicode characters, which is not always the same as the length of $/ in the native encoding.

    Note that parentheses are necessary when you're chomping anything that is not a simple variable. This is because chomp $cwd = `pwd`; is interpreted as (chomp $cwd) = `pwd`;, rather than as chomp( $cwd = `pwd` ) which you might expect. Similarly, chomp $a, $b is interpreted as chomp($a), $b rather than as chomp($a, $b).


    Mago
    mago@rio.pm.org

Re: 3 questions... 2 about newlines, and one on how to be NICE
by dimar (Curate) on Jan 10, 2005 at 23:33 UTC
    ... a long long time ago (You guys guess my age) a newline “\n” was considered a LineFeed (0x0A). Then Selectric typewriters ...

    Hey, you just contributed some interesting stuff to add to my "neato facts" files! I never really stopped to think where those terms actually came from. ... Nice post. Just one small question ... just what exactly is this "typewriter" thing you speak of?

      just what exactly is this "typewriter" thing you speak of?

      A complete misnomer used to name an obsolete piece of mechanical and latterly electro-mechanical equipement specifically designed to make writing slower and harder than it need be.

      The misnomer? The damn things never "righted" anything I ever typed on one!


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.
Re: 3 questions... 2 about newlines, and one on how to be NICE
by Anonymous Monk on Jan 10, 2005 at 23:32 UTC
    Errr... 39? Now you guess mine :)
Re: 3 questions... 2 about newlines, and one on how to be NICE
by nothingmuch (Priest) on Jan 11, 2005 at 23:09 UTC
    In windows (and other CRLF platforms), not only can $/ be changed, but the meaning of \n can too.

    There are two types of input streams, on windows... Text, and binary (ftp suffers from this atrocity too...).

    if you open a file it's normally considered a text file, and printing "\n" into it will result in a CRLF. However, if you binmode FILEHANDLE, then printing "\n" will result in LF, and "\r" will result in CR.

    Scary, huh?

    -nuffin
    zz zZ Z Z #!perl