Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
XP is just a number
 
PerlMonks  

Re: Native newline encoding

by BrowserUk (Pope)
on May 22, 2012 at 16:49 UTC ( #971831=note: print w/ replies, xml ) Need Help??


in reply to Native newline encoding

You can detect what Perl think it should output for "\n" on the current platform:

open O, '>', \$fred;; { local( $/, $\ ); print O "\n";; };; close O;; print unpack 'H*', $fred;; 0a

Maybe not so useful depending upon your purpose?

Isn't the whole point of "\n", that by using it, you allow the runtime and OS to figure it out?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?


Comment on Re: Native newline encoding
Download Code
Re^2: Native newline encoding
by sauoq (Abbot) on May 22, 2012 at 17:59 UTC

    I don't think that really addresses the issue.

    One way to get what he wants would be to write "\n" to a file and then open that file, use binmode(), and then slurp it in and see what you get. It should be, for example, 0d0a on Windows and 0a on *nix.

    I have an inkling that it won't work that way with an in memory file...

    I don't really do Windows but I can check it on a machine with Strawberry Perl later.

    -sauoq
    "My two cents aren't worth a dime.";
      It should be, for example, 0d0a on Windows ... I don't really do Windows

      That's a bit obvious :)

      It isn't perl(*) that writes the extra character; it is the C runtime (when writing to a data file opened as text). Those extra characters are also stripped by the CRT when reading -- assuming text mode.

      If Perl added them itself, then the CRT would also do it and you'd end up with a real mess.

      perl; and Perl programmers shouldn't need to concern themselves with the details, because -- unless they are reading text files in bin mode; which they shouldn't be -- the addition and removal of the 'extra characters' should be entirely transparent.

      (*) ignoring PerlIO which does; but only because it bypasses the CRT and then emulates it -- the point of which mystifies me, but there it is.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        That's a bit obvious :)

        Erm... what's the obvious part? That I don't do Windows or that it should be 0d0a? Or both?

        He asked for the "native newline encoding of the OS"... It doesn't matter what writes it. Maybe it's because I don't do Windows but do have to deal the bastardized text files that come from Windows that the distinction is meaningless. I know I don't care whether it is Perl or the run time writing the characters. That they are there is what matters.

        -sauoq
        "My two cents aren't worth a dime.";

        Also, there are more OS's than *nix and Windows to worry about.

        -sauoq
        "My two cents aren't worth a dime.";
        It isn't perl(*) that writes the extra character;

        (*) ignoring PerlIO which does;

        So, as nearly every modern perl out there has been built with PerlIO, your first statement is either talking about those ~0.01 percent which haven't, or you're saying that PerlIO doesn't belong to perl ... both of which seem pretty nonsensical to me.

Re^2: Native newline encoding
by AnomalousMonk (Monsignor) on May 23, 2012 at 00:04 UTC

    Under Windoze 7:

    >perl -wMstrict -le "my $fred; open O, '>', \$fred;; { local( $/, $\ ); print O \"\n\";; };; close O;; print unpack 'H*', $fred;; " 0a

    I assume the result is the same on a *nix system. Anyone care to try the Mac?

      …result is the same on a *nix system…

      You mean like a Mac? :P

      perl -le 'open F,">",\$f; {local($/,$\); print F "\n"}; print unpack " +H*", $f' 0a

        I'm not at all familiar with the Mac, but I had always heard/understood that it used 0x0d as the underlying OS-mumble-nix file system newline delimiter. No? (But that is separate and apart from the code experiments that have been done in this thread, since I suspect the delimiter is always 0x0a for Perl internally.)

      Sorry, I missed your reply due to all the noise created by my erstwhile friend.

      Under Windoze 7:

      My demo was also run under Windows (Vista), so no surprise there :)

      I assume the result is the same on a *nix system.

      Indeed. And that was exactly the point of the demonstration. salva's a *nix man and knows I'm a windows user; so the significance would not be lost on him.

      Anyone care to try the Mac?

      Since modern macs are essentially *nix, it'll be the same there also. You'd have to go back to MacOS to see a difference I think.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

Re^2: Native newline encoding
by salva (Monsignor) on May 23, 2012 at 08:28 UTC
    I am extending Net::SFTP::Server to implement version 4 of the SFTP protocol. That version supports opening files in TEXT mode (similar to FTP) and there are two ways to do it, first one is to convert the native new-lines to CRLF before sending through the network and the second one is to tell the client what the native newline sequence is and let it handle the burden of the conversion.

    At this point, it seems to me that the simple solution is the first one letting Perl read the file in text mode and then applying s/\n/\r\n/. This may be slightly incorrect in some edge cases (for instance, files on Windows with \n line endings) that nobody would care about so I don't either!

      That version supports opening files in TEXT mode (similar to FTP) and there are two ways to do it, first one is to convert the native new-lines to CRLF before sending through the network and the second one is to tell the client what the native newline sequence is and let it handle the burden of the conversion.

      Hm. My reading of the appropriate RFC is slightly different, in that the server can choose whether to send CRLF or a single char line ending of their choice:

      And it is down to the clients to convert whatever the server sends to their required local form.

      At this point, it seems to me that the simple solution is the first one letting Perl read the file in text mode and then applying s/\n/\r\n/. This may be slightly incorrect in some edge cases (for instance, files on Windows with \n line endings) that nobody would care about so I don't either!

      I whole-heartedly agree, though I would approach that solution in a slightly different manner.

      When TEXT mode is requested:

      1. Open the file in text mode;
      2. Read the file line-by-line using the system default INPUT_SEPARATOR;
      3. chomp each line read;
      4. Write to the socket line-by-line; having set the OUTPUT_SEPARATOR to CRLF;

      This way, whatever the local line separator is, it gets taken care of by Perl (or the CRT of you're using XS). And the data is transmitted with the required 'canonical newlines'.

      Clients then do the same in reverse. Read from the socket line-by-line having set their INPUT_SEPARATOR to CRLF; chomp; and write line-by-line using the default OUTPUT_SEPARATOR for their local platform.

      This way, the conversions are taken care of at both ends by perl or the CRT. At least, for ascii/ANSi/ISO-whatever-that-number-is files that have the 'correct' newlines on the originating platforms.

      Things (will) get far more messy once the RFCs start dealing with Unicrap.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        Things (will) get far more messy once the RFCs start dealing with Unicrap.

        The RFCs have handled binary data for years, and no one batted an eye.

        -sauoq
        "My two cents aren't worth a dime.";

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://971831]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2014-04-21 10:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (493 votes), past polls