rovf has asked for the wisdom of the Perl Monks concerning the following question:

I have a text file on Windows (using \n as line delimiter), which I would like to copy to a Samba share on Unix, and the copy is supposed to also have \n as line delimiter. I thought this should be easy. Here is an excerpt from my code:

use IO::File; ... my $inp=new IO::File $inputfile,'r' or die("$!"); my $content=join "\n",<$csfile>; undef $inp; # close file my $outp=IO::File->new($outputfile,'w') or die("$!"); $outp->binmode; print $outp $content; $outp->close;
I thought binmode would do the trick, but the resulting file has still \r\n as line delimiter. What am I doing wrong?

Additional Info: (Note: File::Copy is not an option here, because I want later have the possibility to modify something in the data before writing)

-- 
Ronald Fischer <ynnor@mm.st>

Replies are listed 'Best First'.
Re: Can't get rid of \r
by tilly (Archbishop) on Sep 11, 2008 at 16:01 UTC
    Why use IO::File? Also when you read the input file, the line terminator is still there, and then you're adding another one with your join. Plus is your script running on Windows or Unix? If Unix, then the \r will be read, never gets removed, and then gets into the output.

    I would handle this as follows:

    open(my $in, "<", $inputfile) or die "Can't read '$inputfile': $!"; open(my $out, "<", $outputfile) or die "Can't read '$outputfile': $!"; binmode($out); while (my $line = <$in>) { $line =~ s/\r\n?/\n/g; print $out $line; } close($in) or die "Can't close '$inputfile': $!"; close($out) or die "Can't close '$outputfile': $!";
    This code should work on both Unix and Windows.

    I actually leave out the closes usually. Putting them in will tell you if, for instance, you had a full disk. But I usually don't worry about that.

      Why use IO::File
      Eh, why not? It's part of Perl CORE, isn't it? I don't deny that there are many other ways to do I/O (with plain open or using IO::Handle for instance), but is there a drawback with IO::File?
      Also when you read the input file, the line terminator is still there, and then you're adding another one with your join.
      Indeed! I forgot to chomp! Thank you for pointing this out.
      is your script running on Windows or Unix?
      The program is supposed to be run on either Windows or Unix. I try to make it as independent as possible (I try to avoid coding two variations of the program for each platform). The input file is guaranteed to have 0x0a as line terminator, even when run under Windows.
      I actually leave out the closes usually.
      Then I would have at least to flush the buffers, otherwise other applications won't be able to access the file, until my program terminates. Also, I think they won't be able to open the file for writing while I still have it open, though this might be platform dependent. In both cases, a close makes sense IMO.
      -- 
      Ronald Fischer <ynnor@mm.st>
Re: Can't get rid of \r
by bobf (Monsignor) on Sep 12, 2008 at 03:01 UTC
Re: Can't get rid of \r
by moritz (Cardinal) on Sep 11, 2008 at 15:46 UTC
    My limited understand is this: When you're on Windows, \n actually means CRLF.

    So the safest approach is to explicitly split on CRLF aka \x0D\x0A (or set $/ to that) and to explicitly join on LF aka \x0A.

    I'd do something like this:

    open my $in, '<:raw', $in_filename or die ...; open my $out, '>:raw', $out_filename or die ...; while (<$in>){ s/\x0D\x0A\z/\x0A/; print $out $_; } close $in or die ..; close $out or die ... ;
      When you're on Windows, \n actually means CRLF. So the safest approach is to explicitly split on CRLF
      Not exactly. The translation happens during reading/writing. Once you have read your file into memory, \n is \x0A on every platform (otherwise, length("\n") would be 2 on Windoze. So my intention was to suppress the translation on writing, by setting the file handle to binmode.
      open my $out, '>:raw', $out_filename

      I think I will try this; after all, this is a good occasion to get familiar with IO layers in Perl. If it suppresses \n conversion, the s/// won't be necessary. I'll post my findings.

      Still, it would be interesting to know why my binmode() did not worked. Is binmode not supposed to be used in that way, or is there a bug in the IO::File::binmode implementation?

      -- 
      Ronald Fischer <ynnor@mm.st>
Re: Can't get rid of \r
by ikegami (Pope) on Sep 11, 2008 at 19:47 UTC

    The above works find on Windows (once you change "$csfile" to "$inp" and change "join "\n"" to "join """).

    Are you using a Cygwin build? I'm not sure how that would work.

      once you change "$csfile" to "$inp"

      Thanks for pointing this out. In my attempt to provide more meaningful variable names for the code posted, I overlooked that one.

      -- 
      Ronald Fischer <ynnor@mm.st>
      The above works find on Windows - Are you using a Cygwin build?

      I'm running the program under plain Windows at the moment...

      But programming is a black art, really. When I came back to work today, with some ideas how to test it, I found that the program runs fine, without any modifications - no extra \r anymore on output! Of course I still had the bug of extra \n, because of the way I read in my data, but this was trivial to correct. In short: I have no idea why it didn't work yesterday, but works well today. Creepy.

      For completeness, here is the full code (unmodified) which I'm using to test. In particular, I did not yet try the (neat) idea of using raw mode. It turned out to be unnecessary in this case:

      use strict; use warnings; use IO::File; # copy \n delimited file from Windows to # Samba share on Unix. Output is also # supposed to be delimited by \n my $par='c:/tmp/dev.cs'; my $file=new IO::File $par,'r' or die "$!"; my $data=join '',<$file>; # !!!! undef $file; my $_path_local='u:/transfer/devc.cs'; my $handle=IO::File->new($_path_local,'w') or die "$!"; $handle->binmode; print $handle $data; $handle->close;
      My best explanation for the fact that it suddenly turned out to work overnight is that I mistakingly grabbed an old version of my application from our source control system (a version which did not have the binmode set), and used *that* one instead of the version I had in my editor. Not likely, but the best explanation aside from a miracle having happened.

      Thank you to everyone for contributing ideas.

      -- 
      Ronald Fischer <ynnor@mm.st>