Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

No Control M

by ambs (Pilgrim)
on Apr 08, 2005 at 18:58 UTC ( #446113=snippet: print w/replies, xml ) Need Help??
Description: There are a lot of different ways to encode file newlines. DOS uses "\r\n", Unix uses just "\n" and some Mac files use just "\r".

I normally want them Unix style, and created a simple script to solve this problem. It is really simple, but really useful. It works both for DOS and Mac files very well.

Update: Sorry but the missing parenthesis are important.

#!/usr/bin/perl -pi
s/(\012?\r|\r?\012)/\012/g;
Replies are listed 'Best First'.
Re: No Control M
by cazz (Pilgrim) on Apr 08, 2005 at 19:45 UTC
    1. I prefer using dos2unix & mac2unix, as they are quite a bit faster than firing up perl, but I can see the usefulness of doing it all at once. (Though, a simple shellscript wrapper that calls dos2unix & mac2unix would probably still be faster)
    2. Also, you use octal as well as the escape sequences. Why not pick one method of representing characters and stick with it? \n is easier to read for most of us than \012.
    3. Your code can be faster by only mucking with files that have \r. If a file already has unix line endings, you are still modifying the data in place. Try this instead:
      s/\r\n?/\n/g;
      I don't use "\n" because on some encodings this is not the real "\012".

      Also, my regular expression solves some weird non-unix and non-mac files I've found, which have first the newline, then the carriage return.

      Alberto Simões

            I don't use "\n" because on some encodings this is not the real "\012".

        What encodings? The unicode mechanism for specifying \n is 0x000a 0 according to Unicode Standard Annex #13: Unicode Newline Guidelines. Sure, there is EBCDIC, but translating around the \r doesn't help fix newlines on EBCDIC.

           Also, my regular expression solves some weird non-unix and non-mac files I've found, which have first the newline, then the carriage return.

        I've never heard of a system that used \n\r. Do you know what generates those files?

        0: 0x0a is the same as \n in standard unix land, the unicode equiv is just null prepended.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://446113]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2020-06-05 04:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?



    Results (35 votes). Check out past polls.

    Notices?