Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

A Little History on 0D0A

by jeffa (Bishop)
on Mar 31, 2001 at 22:46 UTC ( #68687=note: print w/replies, xml ) Need Help??


in reply to Cross platform file I/O code

As Trimbach said, you should not have a problem. But why is there such a concern with the compatiliblity between record seperators across platforms in the first place? Why so much trouble? Time for a little history:

Blame it on the typewriters - better yet, that lever or button on the right side of them. The one used to start typing on the next line. When you activate it, it causes

  1. the carriage to 'return' to the right so the hammers are lined up on the left side of the paper (carriage return)
  2. the carriage to roll up so the hammers are lined up on the next 'line' down (line feed)
In the early 1900's teletypwriters were used to relay messages. Teletypwriters were descendants of the telegraph, and used a code similar to ASCII, called Baudot (or Murray) code. An operator could type into the teletypewriter (tty for short) and print a message on another tty far away.

In this Baudot code, two special characters were designated for a Carriage Return(0x02) and a Line Feed(0x08). Baudot code went the way of the dinosaur for reasons outside the scope of this discussion, but the CR and LF characters were adopted by ASCII with different values:

CR = 0x0D = \015 = \r LF = 0x0A = \012 = \n
Why were they adopted? For printers of course. The concept of the tty was split into two - a monitor and a printer. At this point, it was up to the various operating systems to implement their actual use. Oops.
Macintosh: \r Windows : \r\n Unix : \n
UPDATE: Disregard that last table, kept only for historical purposes. Here is a better table, one that shows how the three different operatiing systems interpret a logical newline (ps, thanks Mr. Stein =] )

Unix : \n = \012 Macintosh: \n = \015 Windows : \n = \012 if handled as ASCII Windows : \n = \015\012 if handled as binary
This only causes problems when you transfer ASCII files around different operating systems as binary data, in which case you should use binmode - or when you are programming with sockets, in which case you will need to set $/ to '\015\012' or just use the exported globals $CRLF and CRLF() from the Socket or IO::Socket modules.

So, the next time you find yourself cursing this confusion, just take a look at this typewriter history tree and remember that we are only human, except arhuman. :)

Jeff

R-R-R--R-R-R--R-R-R--R-R-R--R-R-R--
L-L--L-L--L-L--L-L--L-L--L-L--L-L--
UPDATE: Thanks to the anonymous monk for clarifying things up.

REFERENCES:

Replies are listed 'Best First'.
Re: A Little History on 0D0A
by indigo (Scribe) on Apr 01, 2001 at 00:00 UTC
    Good post.

    Probably important note printers were the dominant output device through much of the 50's and 60's, and well into the 70's. The CR/LF was necessary for use with early computers, and was well entrenched before monitors became mainstream.

    Because printers didn't allow you to backup and erase, you wrote code with a line editor. When you wanted a change, you had to figure out what line numbers to edit, print them out, and type in your revisions. This all could get pretty tedious, so various shortcuts were devised to search for patterns in a file, to replace one string with another, etc. In time, these shortcuts grew into regular expressions, which as we all know, have proven useful long after the line editor has become a thing of the past.

    So, the same mechanism that gives us end of line headaches, gave rise to a cornerstone of the greatest programming language ever. I think we came out ahead on this one. :)
Re: A Little History on 0D0A
by Anonymous Monk on Apr 01, 2001 at 13:38 UTC

    While your history seems somewhat accurate you seem to be mistaken about what is meant by "\n" in Perl. First of all it is not necessarily a LINE FEED character. If the folks who wrote C and Perl had wanted to escape a character for LINE FEED then why did they not use "\l" that is an escaped ell or the lower case version of L. The reason is simple: "\n" is a logical newline, yes the n is for NEWLINE. NEWLINE is not a character in the ASCII character set. It is a contrivance to get around the platform incompatabilities that operating systems impose on what are colloquially known as "TEXT" files. It is true that Macs end their text file lines with a CARRIAGE RETURN character but within C, C++, Perl and several other Mac Programming languages you need only specify that you want a NEWLINE in your text file and:

    print FOO "A\n";
    will put the code 65 then the code 13 into your text file on a Mac, it would put the code 65 then 10 into the file on Unix and it would put the codes 65 13 10 into the file on a Microsoft operating system assuming that a binmode(FOO); statement had not appeared prior to the print statement.

    By the way, Unix was invented around 1969-1970 and predates the only OSes that use the CARRIAGE RETURN + LINE FEED combination for end of text line characters by about 10-12 years. Unix uses a LINE FEED for the end of lines in files called text files. The Unix OS was capable of sending text files to printers for years before DOS came along.

Re: A Little History on 0D0A
by Beatnik (Parson) on Mar 31, 2001 at 23:17 UTC
    Hey,
    Well the problem is not having a uniform record separator, but having one that is applicable on the platform. If I'd use \n in my files on every platform, and only my code would access it, it wouldn't be a problem ofcourse... but assume someone will actually have to edit something manually and finds his editor acting weird cause of the \n's...

    Greetz
    Beatnik
    ... Quidquid perl dictum sit, altum viditur.
Re: A Little History on 0D0A
by Anonymous Monk on Jun 21, 2008 at 15:31 UTC
    0x0D (hex) is decimal 013 and 0x0A (hex) is decimal 010 .. or i'm wrong?

      0x0D is hexadecimal 0D which is decimal 13 which is octal 15 (or 015).

      Similarly, 0x0A is hexadecimal 0A which is decimal 10 which is octal 12 (or 012).

      I think you have been confused by the fact that in the post you were commenting, the hex and octal forms were used, not the decimal.

      BTW, did you know that this thread is 7 years old? :)

      Careful with that hash Eugene.

        They probably didn't know it was that old...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://68687]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2019-05-20 03:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you enjoy 3D movies?



    Results (123 votes). Check out past polls.

    Notices?