Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Line Feeds

by halley (Prior)
on Jun 09, 2003 at 19:06 UTC ( [id://264431]=note: print w/replies, xml ) Need Help??


in reply to Line Feeds

If you must specify file contents with an exact byte sequence, use literal octal or hexadecimal notations, and not the semantic names like \n or \r. The semantic names are subject to translations according to platform-specific encoding features. Binmode your filehandle to ensure no translations occur outside of your control.

For example, binmode(OUTPUT); print OUTPUT "Hello\x0D\x0A" ensures a carriage-return and newline on all platforms. Conversely, dropping the \x0D part will ensure there's no carriage return (which appears as ^M in some editors).

--
[ e d @ h a l l e y . c c ]

Replies are listed 'Best First'.
Re^2: Line Feeds (rumor control)
by tye (Sage) on Jun 10, 2003 at 16:07 UTC

    That is pretty misleading (or just incorrect). "\n" and "\x0A" are exactly the same thing unless you are on a non-ASCII system or an old Mac. Using "\x0A" is only an improvement when on an old Mac. On a non-ASCII system, using "\x0A" is likely to simply break things. On all other systems, using "\x0A" is identical to using "\n".

    So, there are no systems where "\x0A" is likely to be subject to fewer translations (since old Macs don't translate either character and non-ASCII systems will likely be translating the whole character set or none of it, depending on destination).

    Your second paragraph is correct if you add "on an ASCII system".

                    - tye

      Please read the writeup at binmode, as well as the bit about newlines in perlport. And yes, I believe Perl is implemented on many non-ASCII systems.

      While '\n' and '\x0A' are exactly the same thing on ASCII systems in the storage of perl scalars, that's a mouthful to say. By omission, that means that they may NOT be the same thing on disk, or via socket, or on non-ASCII systems.

      This is akin to the HTML argument between semantic <strong> and literal <b>. Semantics enforce user/platform preferences, and literals enforce author preferences.

      My advice was to use semantic names when you want semantic meanings, and use literal numerical values when being literal is important. Binmode tells Perl you care. The syntax you use tells the developer you care. Remember, source code is for the human to read, too, and using the \x0A clues the maintenance programmer that the byte values matter. I don't see how that's misleading or incorrect.

      --
      [ e d @ h a l l e y . c c ]

        I didn't say "in the storage of perl scalars" and me not saying that was intentional. On a Windows system, print "\x0A" results in "\r\n" in a file (by default). On a Unix system, print "\x0A" results in "\r\n" being sent to the terminal device (by default).

        You said that "semantic names" are subject to platform-specific translations. I find that misleading. It implies that "\n" will be translated while "\x0A" will not. The only context in which this makes some sense is on an old Mac. But even in that case, "\n" is not translated, it simply means something different than "\x0A".

        If you want to use "\n" to mean the two characters ("\r\n") that represent a newline in a Windows file, then I don't ever want to talk to you about such things as the conversation would be hopelessly confusing. (I'm not sure that you do, but some versions of perlport seem to want to.)

        Writing "\x0A" mostly tells me that the author doesn't care about non-ASCII systems. (Not a horrible stance.)

        Sorry, I find both binmode and perlport to be misleading (some versions worse than others).

        And I think the choice on old Macs to have "\r" and "\n" reversed in C was a mistake. To my mind, this makes old Macs nearly ASCII systems and you should have to translate from this near-ASCII character set when interfacing to other true ASCII systems. Unfortunately, Macs refuse to make these translations and expect programmers to instead do stupid things like hard code ASCII values into languages that are designed to not be dependent on character encoding (and thus produce code that is dependent on character encoding and thus less portable).

        Letting this design mistake in old Macs permeate all code such that it is no longer independent of character encoding is very distasteful to me.

        If you want to write portable code (that takes both old Macs and (other) non-ASCII systems into account), then you should use things like "\x0A" only after you have detected whether you are on a system where that works. For example, see "sub crlf" in CGI::Simple.pm or $CRLF in CGI.pm.

        Using "\x0A" without first checking whether you are on a ASCII system or not is not a good form of documentation (of anything other than your lack of interested in non-ASCII systems). And saying the "\x0A" is like "\n" but without translation is doubly wrong. It makes some sense if you are on an old Mac (but is still pretty confusing since it implies that "\n" is not "\n" on a Mac -- again a 'true' statement in a perverse sense but also a confusing one). Everywhere else, either the first part of the sentence is wrong or the last part is.

        "\x0A" is the ASCII newline character. "\n" is the (single) newline character in the native character set. Old Macs claim to be ASCII but use ASCII newline as carriage return and use ASCII carriage return as newline (and map "\r" and "\n" backwardly in accordance with this non-conformance in both C and Perl). Yet old Macs are close enough to ASCII that they don't bother to translate this non-ASCII nature when talking over sockets to ASCII systems.

        This design mistake in old Macs and the relative popularity of Macs over non-ASCII systems has made many discussions about "\r" and "\n" nearly impossible to decode. This is because some people think "\r" is carriage return or newline, depending on whether you are on an old Mac or not. But this thought is inaccurate. You can say '"\r" is ASCII newline on a Mac and ASCII carriage return most other places'. And you can say that "\r" is carriage return on the platform in question. But making a big deal out of the fact that carriage return on an old Mac is also ASCII newline to the point of thinking '"\r" might be newline' (note the lack of "ASCII" in that sentence), leads to no end of confusion.

        Trying to resolve this by redefining "\n" as "logical newline" (that might mean a two-character sequence) just leads to the confusion that I often see, such as people being surprised that length($/) and length("\n") are 1 on Windows or surprised that chomp on Windows doesn't remove "\r\n" from a string or surprised that $/ ne "\r\n" on Windows.

        The translation to/from "\n" and "\r\n" happens in (the C run-time library layer of) the Windows file system (unless you use binmode) and happens in the device driver layer on Unix (unless the device is configure not to; see, for example "stty onlcr" or "raw mode"). On old Macs, "\n" and "\r" are reversed from their ASCII representations similar to how "\n" and "\r" aren't ASCII values on (other) non-ASCII systems. Old Macs don't translate this anywhere (so binmode doesn't matter on them).

                        - tye

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://264431]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-16 05:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found