http://www.perlmonks.org?node_id=280279

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

on unix, end of line is "\n"
on windows, its "\r\n"
on Mac, is "\r"

I have a mix of files that are created and updated on all 3 platforms. I need to chomp() the end of line on a unix system, but need to make sure that it works regardless of the platform it was created on. I was thinking a simple regex that replaces "\n" at the end of the line, and then "\r" at the end of the line would do the trick. but that seems like a bit of a hack for something that should have a much nicer solution.

Whats the best perl solution to do this?

Replies are listed 'Best First'.
Re: Cross Platform end of Line characters
by Cine (Friar) on Aug 02, 2003 at 14:12 UTC
    Unfortunatly it is the regexp: s/[\r\n]+$//;

    T I M T O W T D I
      Hmmm...

      From perlport:

             A common misconception in socket programming is that "\n"
             eq "\012" everywhere.  When using protocols such as common
             Internet protocols, "\012" and "\015" are called for
             specifically, and the values of the logical "\n" and "\r"
             (carriage return) are not reliable.
      
                 print SOCKET "Hi there, client!\r\n";      # WRONG
                 print SOCKET "Hi there, client!\015\012";  # RIGHT
      
      Doesn't that apply here as well? Or only if you want to make sure you're script has to be super portable?

      Liz

        Well, yes and no. Mac, Win and Linux all use ASCII, which defines \n to be \012, if the document in question is in a non-ascii format, then a larger conversion than just lineends are probably needed anyway...

        T I M T O W T D I
        From my experience writing a client / server application on Win2000 and AIX, your much beeter off using syswrite and sysread to communicate over the socket. This way you don't need to worry about the \n or \r when you transfer your data over the socket using print.
Re: Cross Platform end of Line characters
by adamk (Chaplain) on Aug 03, 2003 at 04:37 UTC
    My normal method for fixing this is to slurp the entire file in using something like.

    sub slurp { my $file = shift or return undef; local $/ = undef; open( FOO, $file ) or return undef; my $buffer = <FOO>; close FOO; return \$buffer; }
    (The return by refence is just there to avoid copying large files more times than is needed)

    Once it's in split it by hand using my handy dandy "works for any platform" regex.
    my $content = slurp( $file ) or die "Failed to load file"; my @lines = split /(?:\015\012|\015|\012)/, $$content;
    And we are done. The important bit here is (?:\015\012|\015|\012), which works everywhere. In fact, it will even work for "broken" files that somehow got multiple types of newlines in a single file. And note the order of the three newlines IS important.

    Other things you can do with it is to "fix" newlines for the "current" platform using.
    $$content =~ s/(?:\015\012|\015|\012)/\n/g;