Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re^4: Line Feeds (rumor control)

by tye (Sage)
on Jun 10, 2003 at 17:47 UTC ( #264785=note: print w/replies, xml ) Need Help??

in reply to Re: Re^2: Line Feeds (rumor control)
in thread Line Feeds

I didn't say "in the storage of perl scalars" and me not saying that was intentional. On a Windows system, print "\x0A" results in "\r\n" in a file (by default). On a Unix system, print "\x0A" results in "\r\n" being sent to the terminal device (by default).

You said that "semantic names" are subject to platform-specific translations. I find that misleading. It implies that "\n" will be translated while "\x0A" will not. The only context in which this makes some sense is on an old Mac. But even in that case, "\n" is not translated, it simply means something different than "\x0A".

If you want to use "\n" to mean the two characters ("\r\n") that represent a newline in a Windows file, then I don't ever want to talk to you about such things as the conversation would be hopelessly confusing. (I'm not sure that you do, but some versions of perlport seem to want to.)

Writing "\x0A" mostly tells me that the author doesn't care about non-ASCII systems. (Not a horrible stance.)

Sorry, I find both binmode and perlport to be misleading (some versions worse than others).

And I think the choice on old Macs to have "\r" and "\n" reversed in C was a mistake. To my mind, this makes old Macs nearly ASCII systems and you should have to translate from this near-ASCII character set when interfacing to other true ASCII systems. Unfortunately, Macs refuse to make these translations and expect programmers to instead do stupid things like hard code ASCII values into languages that are designed to not be dependent on character encoding (and thus produce code that is dependent on character encoding and thus less portable).

Letting this design mistake in old Macs permeate all code such that it is no longer independent of character encoding is very distasteful to me.

If you want to write portable code (that takes both old Macs and (other) non-ASCII systems into account), then you should use things like "\x0A" only after you have detected whether you are on a system where that works. For example, see "sub crlf" in or $CRLF in

Using "\x0A" without first checking whether you are on a ASCII system or not is not a good form of documentation (of anything other than your lack of interested in non-ASCII systems). And saying the "\x0A" is like "\n" but without translation is doubly wrong. It makes some sense if you are on an old Mac (but is still pretty confusing since it implies that "\n" is not "\n" on a Mac -- again a 'true' statement in a perverse sense but also a confusing one). Everywhere else, either the first part of the sentence is wrong or the last part is.

"\x0A" is the ASCII newline character. "\n" is the (single) newline character in the native character set. Old Macs claim to be ASCII but use ASCII newline as carriage return and use ASCII carriage return as newline (and map "\r" and "\n" backwardly in accordance with this non-conformance in both C and Perl). Yet old Macs are close enough to ASCII that they don't bother to translate this non-ASCII nature when talking over sockets to ASCII systems.

This design mistake in old Macs and the relative popularity of Macs over non-ASCII systems has made many discussions about "\r" and "\n" nearly impossible to decode. This is because some people think "\r" is carriage return or newline, depending on whether you are on an old Mac or not. But this thought is inaccurate. You can say '"\r" is ASCII newline on a Mac and ASCII carriage return most other places'. And you can say that "\r" is carriage return on the platform in question. But making a big deal out of the fact that carriage return on an old Mac is also ASCII newline to the point of thinking '"\r" might be newline' (note the lack of "ASCII" in that sentence), leads to no end of confusion.

Trying to resolve this by redefining "\n" as "logical newline" (that might mean a two-character sequence) just leads to the confusion that I often see, such as people being surprised that length($/) and length("\n") are 1 on Windows or surprised that chomp on Windows doesn't remove "\r\n" from a string or surprised that $/ ne "\r\n" on Windows.

The translation to/from "\n" and "\r\n" happens in (the C run-time library layer of) the Windows file system (unless you use binmode) and happens in the device driver layer on Unix (unless the device is configure not to; see, for example "stty onlcr" or "raw mode"). On old Macs, "\n" and "\r" are reversed from their ASCII representations similar to how "\n" and "\r" aren't ASCII values on (other) non-ASCII systems. Old Macs don't translate this anywhere (so binmode doesn't matter on them).

                - tye

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://264785]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (9)
As of 2017-07-21 13:13 GMT
Find Nodes?
    Voting Booth?
    I came, I saw, I ...

    Results (322 votes). Check out past polls.