Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Unix \n vs. DOS \n

by greenhorn (Sexton)
on Jul 15, 2000 at 13:22 UTC ( #22689=perlquestion: print w/replies, xml ) Need Help??

greenhorn has asked for the wisdom of the Perl Monks concerning the following question:

Someone at work wrote to an in-house Perl discussion alias:
Does anyone know a clean way to test for
DOS v. UNIX EOL in a text file (using Perl) ? It seems
that
 (chop($UnixLine) == chop($DOSLine))returns true =-(

No one replied. I thought: hey, I might be a mere newbie, but I'll bet I can figure this out.

Wrong. :) I made some small test files with both <CR><LF> line endings and <LF>-only line endings. Then I watched the results of extracting only those characters from each line in the Perl Builder watch-window. It appeared as if the same characters were returned each time (never mind that one line-ending was \x0D\x0Aand the other was \x0Aalone).

It struck me that using ==there wasn't right; should he not be using "eq"? Had he in fact
actually compared 0 with 0? (0 == 0does have a certain ring of truth to it.:)

Does perl for Win32 "internally" convert Unix newlines to <CR><LF>?

Replies are listed 'Best First'.
RE: Unix \n vs. DOS \n
by Abigail (Deacon) on Jul 15, 2000 at 15:12 UTC
    Somewhere, Tom has a large writing about this. But the basics is that DOS stores CR LF only for text files, and only when written on a physical device. As soon as you read it in, the C library turns the physical line ending of CR LF into the logical newline \n. And when you write it to a file, the reverse happens. That is, if you run the program under DOS.

    If you take your DOS file to a Unix platform, only the LF gets mapped to the logical newline \n (which happens to be represented with a LF character as well). The preceeding CR byte is considered by Unix to be just another byte. Also note that chop chops of the last character of a string. One character, nothing more. So, if you are on Unix, reading a line from either a Unix file or a DOS file, the last character will be LF, aka \x0A.

    So, yes, the comparison should have been done with eq instead of ==, but that still doesn't make a difference, "\x0A" eq "\xOA".

    There is flawless way to determine wether something is a "Unix line" or a "DOS line". "Unix line"s end with a LF character, and "DOS line"s with CR LF. However, there is nothing that forbids a "Unix line" to have a CR character just before the LF character.

    -- Abigail

(Ovid) RE: Unix \n vs. DOS \n
by Ovid (Cardinal) on Jul 15, 2000 at 21:00 UTC
    As a side note, don't use chop to get rid of newlines. I see this all the time in programs and it makes me cringe. You want to use chomp.

    chomp will only remove the last character if it's a newline. Consider the following "harmless" code:

    #!/usr/bin/perl -w use strict; while (<DATA>) { chop; print "$_\n"; } __DATA__ this is a test this is another
    You can't see it in the above code, but I deliberately did not hit "Enter" after the last line. I even hit backspace a few times to ensure that there was nothing after the word "another". The result?
    this is a test this is anothe
    chop happily removed the "r" in another. chomp was designed for situations like that and should be used where appropriate.

    Cheers,
    Ovid

Re: Unix \n vs. DOS \n
by vkonovalov (Monk) on Jul 15, 2000 at 14:34 UTC
    You probably forgot to use "binmode" built-in function, which makes sence for text-mode or binary-mode.

    Otherwise, if you're inside perl script, then perl makes UNIX-like line endings, for example in HERE-IN strings and inside any strings:

    $a=<<"EOS"; abcd efgh EOS
    and
    $a="abcd efgh ";
    and
    $a="abcd\nefgh\n";
    are the same.
RE: Unix \n vs. DOS \n
by BigJoe (Curate) on Jul 15, 2000 at 16:34 UTC
    To answer your question

    Does perl for Win32 "internally" convert Unix newlines to CR-LF?

    NO. I usually do a regular exp to convert all \n s to \r\n like this
    $mystring = ~s/\n/\r\n/g;


    Hope this helps.

    --BigJoe
Re: Unix \n vs. DOS \n
by greenhorn (Sexton) on Jul 16, 2000 at 15:20 UTC
    I believe the fellow at work used chop because he wanted to have Perl return the line-ending character; chomp seems to return only a result code and not the "chomped" character.

    I created a four-line text file in which two lines had CR/LF line endings and the other two had LF-only line endings. Then, a small script that reads each line of the file. Following is the business end of it. (All lines in the file have "F" immediately before the line boundary.)

    # TWO LINES IN THE FILE MEET THE FOLLOWING CRITERIA: print "ends CRLF\n" if /F\x0D\x0A$/; print "ends CRLF\n" if /F\r\n/; print "contains CR\n" if /\x0D/; print "contains CR\n" if /\r/; # AND THE OTHER TWO LINES IN THE FILE MATCH THIS: print "ends LF only\n" if /F\x0A$/;

    But the script printed only this: ends LF only. It never did print ends CRLF or contains CR.

    If perl doesn't make some internal translation of the carriage-return characters when it's reading a file, then why that result? Are the tests above not sufficient?

      chomp returns the number of characters removed. It removes whatever's in $/, so he can just check that.

      Update: Yes, $/, as jlp pointed out.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://22689]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2022-09-29 20:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (125 votes). Check out past polls.

    Notices?