Re: Native newline encoding

Replies are listed 'Best First'.
Re^2: Native newline encoding by salva (Canon) on May 23, 2012 at 08:28 UTC
I am extending Net::SFTP::Server to implement version 4 of the SFTP protocol. That version supports opening files in TEXT mode (similar to FTP) and there are two ways to do it, first one is to convert the native new-lines to CRLF before sending through the network and the second one is to tell the client what the native newline sequence is and let it handle the burden of the conversion. At this point, it seems to me that the simple solution is the first one letting Perl read the file in text mode and then applying `s/\n/\r\n/`. This may be slightly incorrect in some edge cases (for instance, files on Windows with `\n` line endings) that nobody would care about so I don't either!	[reply] [d/l] [select]
Re^3: Native newline encoding by BrowserUk (Patriarch) on May 23, 2012 at 09:16 UTC
That version supports opening files in TEXT mode (similar to FTP) and there are two ways to do it, first one is to convert the native new-lines to CRLF before sending through the network and the second one is to tell the client what the native newline sequence is and let it handle the burden of the conversion. Hm. My reading of the appropriate RFC is slightly different, in that the server can choose whether to send CRLF or a single char line ending of their choice: <Reveal this spoiler or all in this thread> And it is down to the clients to convert whatever the server sends to their required local form. At this point, it seems to me that the simple solution is the first one letting Perl read the file in text mode and then applying s/\n/\r\n/. This may be slightly incorrect in some edge cases (for instance, files on Windows with \n line endings) that nobody would care about so I don't either! I whole-heartedly agree, though I would approach that solution in a slightly different manner. When TEXT mode is requested: Open the file in text mode; Read the file line-by-line using the system default INPUT_SEPARATOR; chomp each line read; Write to the socket line-by-line; having set the OUTPUT_SEPARATOR to CRLF; This way, whatever the local line separator is, it gets taken care of by Perl (or the CRT of you're using XS). And the data is transmitted with the required 'canonical newlines'. Clients then do the same in reverse. Read from the socket line-by-line having set their INPUT_SEPARATOR to CRLF; chomp; and write line-by-line using the default OUTPUT_SEPARATOR for their local platform. This way, the conversions are taken care of at both ends by perl or the CRT. At least, for ascii/ANSi/ISO-whatever-that-number-is files that have the 'correct' newlines on the originating platforms. Things (will) get far more messy once the RFCs start dealing with Unicrap. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l]
Re^4: Native newline encoding by sauoq (Abbot) on May 23, 2012 at 11:55 UTC
Things (will) get far more messy once the RFCs start dealing with Unicrap. The RFCs have handled binary data for years, and no one batted an eye. `-sauoq "My two cents aren't worth a dime.";`	[reply]
Re^5: Native newline encoding by BrowserUk (Patriarch) on May 23, 2012 at 15:04 UTC
Re^6: Native newline encoding by sauoq (Abbot) on May 23, 2012 at 21:38 UTC
Some notes below your chosen depth have not been shown here
Re^2: Native newline encoding by AnomalousMonk (Archbishop) on May 23, 2012 at 00:04 UTC
Under Windoze 7: `>perl -wMstrict -le "my $fred; open O, '>', \$fred;; { local( $/, $\ ); print O \"\n\";; };; close O;; print unpack 'H', $fred;; " 0a` [download] I assume the result is the same on a nix system. Anyone care to try the Mac?	[reply] [d/l]
Re^3: Native newline encoding by Your Mother (Archbishop) on May 23, 2012 at 01:12 UTC
�result is the same on a nix system�* You mean like a Mac? :P `perl -le 'open F,">",\$f; {local($/,$\); print F "\n"}; print unpack " +H*", $f' 0a` [download]	[reply] [d/l]
Re^4: Native newline encoding by AnomalousMonk (Archbishop) on May 23, 2012 at 03:41 UTC
I'm not at all familiar with the Mac, but I had always heard/understood that it used 0x0d as the underlying OS-mumble-nix file system newline delimiter. No? (But that is separate and apart from the code experiments that have been done in this thread, since I suspect the delimiter is always 0x0a for Perl internally.)	[reply]
Re^5: Native newline encoding by Your Mother (Archbishop) on May 23, 2012 at 13:47 UTC
Re^3: Native newline encoding by BrowserUk (Patriarch) on May 23, 2012 at 04:40 UTC
Sorry, I missed your reply due to all the noise created by my erstwhile friend. Under Windoze 7: My demo was also run under Windows (Vista), so no surprise there :) I assume the result is the same on a nix system.* Indeed. And that was exactly the point of the demonstration. salva's a nix man and knows I'm a windows user; so the significance would not be lost on him. Anyone care to try the Mac?* Since modern macs are essentially *nix, it'll be the same there also. You'd have to go back to MacOS to see a difference I think. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply]
Re^2: Native newline encoding by sauoq (Abbot) on May 22, 2012 at 17:59 UTC
I don't think that really addresses the issue. One way to get what he wants would be to write "\n" to a file and then open that file, use binmode(), and then slurp it in and see what you get. It should be, for example, `0d0a` on Windows and `0a` on *nix. I have an inkling that it won't work that way with an in memory file... I don't really do Windows but I can check it on a machine with Strawberry Perl later. `-sauoq "My two cents aren't worth a dime.";`	[reply] [d/l] [select]
Re^3: Native newline encoding by BrowserUk (Patriarch) on May 22, 2012 at 18:10 UTC
It should be, for example, 0d0a on Windows ... I don't really do Windows That's a bit obvious :) It isn't perl() that writes the extra character; it is the C runtime (when writing to a data file opened as text). Those extra characters are also stripped by the CRT when reading -- assuming text mode. If Perl added them itself, then the CRT would also do it and you'd end up with a real mess. perl; and Perl programmers shouldn't need to concern themselves with the details, because -- unless they are reading text files in bin mode; which they shouldn't be -- the addition and removal of the 'extra characters' should be entirely transparent. () ignoring PerlIO which does; but only because it bypasses the CRT and then emulates it -- the point of which mystifies me, but there it is. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply]
Re^4: Native newline encoding by Anonymous Monk on May 22, 2012 at 19:09 UTC
It isn't perl() that writes the extra character; () ignoring PerlIO which does; So, as nearly every modern perl out there has been built with PerlIO, your first statement is either talking about those ~0.01 percent which haven't, or you're saying that PerlIO doesn't belong to perl ... both of which seem pretty nonsensical to me.	[reply]
Re^4: Native newline encoding by sauoq (Abbot) on May 22, 2012 at 18:34 UTC
That's a bit obvious :) Erm... what's the obvious part? That I don't do Windows or that it should be `0d0a`? Or both? He asked for the "native newline encoding of the OS"... It doesn't matter what writes it. Maybe it's because I don't do Windows but do have to deal the bastardized text files that come from Windows that the distinction is meaningless. I know I don't care whether it is Perl or the run time writing the characters. That they are there is what matters. `-sauoq "My two cents aren't worth a dime.";`	[reply] [d/l]
Re^5: Native newline encoding by BrowserUk (Patriarch) on May 22, 2012 at 18:49 UTC
Re^6: Native newline encoding by sauoq (Abbot) on May 22, 2012 at 19:03 UTC
Some notes below your chosen depth have not been shown here
Re^4: Native newline encoding by sauoq (Abbot) on May 22, 2012 at 18:45 UTC
Also, there are more OS's than *nix and Windows to worry about. `-sauoq "My two cents aren't worth a dime.";`	[reply]


Clear questions and runnable code get the best and fastest answer
	PerlMonks