Dealing with files with differing line endings

dd-b has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Dealing with files with differing line endings by stevieb (Canon) on Nov 05, 2021 at 21:17 UTC
My File::Edit::Portable was written to deal with this exact situation. Get a file handle of the file with the record separators changed to that of the local platform, make changes, and write it back to the same file with the original record separators: `use File::Edit::Portable; my $rw = File::Edit::Portable->new; my $fh = $rw->read('file.txt'); ... $rw->write(contents => $fh);` [download] Get an array of a file's contents with the line endings stripped off (one line per element), make changes, and write the data back to the original file (the original line endings will be preserved and put back into place automagically): `my @contents = $rw->read('file.txt'); for (@contents) { ... } $rw->write(contents => \@contents);` [download] There's a myriad of other magic you can do as well, like automatically making a backup copy of each file, chaging line endings, using custom line endings, checking what endings a file is using, splicing stuff into the files etc.	[reply] [d/l] [select]
Re^2: Dealing with files with differing line endings by dd-b (Monk) on Nov 05, 2021 at 21:21 UTC
Was guessing I wasn't the first person to have something like this problem! Thanks for pointing out your module.	[reply]
Re: Dealing with files with differing line endings by ikegami (Patriarch) on Nov 05, 2021 at 20:07 UTC
All the systems you mentioned use CR LF or LF (unless you meant the ancient MacOS which used CR). So just use LF as the line terminator as usual, but use something like `s/\s+\z//` instead of `chomp`. `while (<>) { s/\s+\z//; ... }` [download] Alternatively, you could add a `:crlf` layer to the handle. `open(my $fh, '<:crlf', $qfn) or die("Can't open \"$qfn\": $!\n"); while (<$fh>) { chomp; ... }` [download] This already happens by default on Windows, which is why it can handle the listed file formats naturally.	[reply] [d/l] [select]
Re^2: Dealing with files with differing line endings by dd-b (Monk) on Nov 05, 2021 at 21:21 UTC
Good point! I was jumping back to a more general question than I need to solve. As you say, I can just force LF for line boundaries. Parsing the contents can handle various line separators with \R, I think it already does (or I could do my own chomp with suitable regexp to kill all kinds of line terminators).	[reply]
Re^3: Dealing with files with differing line endings by ikegami (Patriarch) on Nov 06, 2021 at 00:55 UTC
As you say, I can just force LF for line boundaries No need to force anything. `$/` is already a LF on all systems except ancient MacOS. Just replace `chomp;` with `s/\s+\z//;`.	[reply] [d/l] [select]
Re: Dealing with files with differing line endings by LanX (Saint) on Nov 05, 2021 at 20:03 UTC
I always thought `chomp` handles that. Could you provide us with an example which goes wrong? Possible solutions, (if needed) replace chomp with a regex in your code override chomp with your own version in legacy code. edit Could it be you are not using chomp at all, but setting `$/` to get rid of the line-endings? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^2: Dealing with files with differing line endings (overriding chomp) by LanX (Saint) on Nov 05, 2021 at 20:09 UTC
Here an example to overide chomp `use strict; use warnings; package NewChomp; use Data::Dump qw/pp dd/; use subs qw/chomp/; sub chomp { $_[0] =~ s/\n$//; # adjust here } pp my $line ="abcd\n"; chomp $line; pp $line;` [download] Just export it from a new module into your scripts, and adjust the regex to your needs. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l]
Re^2: Dealing with files with differing line endings by dd-b (Monk) on Nov 05, 2021 at 21:17 UTC
Chomp cleans off the end of a string based on the current value of $/. I need something to cause reading the next line of the file to terminate in the correct place. (And then I probably do also need to do something like chomp, but that's easy.)	[reply]
Re: Dealing with files with differing line endings by BillKSmith (Monsignor) on Nov 06, 2021 at 03:01 UTC
A general solution is impossible. Any file can contain normal text characters that another OS would interpret as line separators. You may be able to assume that this will never happen with your data. Your idea of slurping the entire file (in binmode) into a string is probably the safest. Use anything you know about the file (line length, number of lines, words that only occur at the start or end of a line, etc) to determine which kind of file it is. Open the string as a memory file with the appropriate IO layer. You could then use the <> operator exactly as you normally would. Bill	[reply]
Re^2: Dealing with files with differing line endings by Marshall (Canon) on Nov 06, 2021 at 23:20 UTC
We may be overthinking this. ikegami's solution should be fine. The exception is that ancient Mac which uses <CR> instead of <CR><LF> or <LF> for line endings. One of my users was using an old Mac to edit one of my config files and reported that my config file "didn't work". I talked with this guy and told him to set his text editor to "write DOS compatible files" and that ended the problem. Modern Macs use <LF>. Unless there is a specific strange requirement, writing code to handle ancient Mac is not worth the effort.	[reply]
Re^3: Dealing with files with differing line endings by BillKSmith (Monsignor) on Nov 07, 2021 at 21:22 UTC
As a practical matter, I am sure that you are right. However, it is important to know that there are corner cases. Consider the following contrived example. `use strict; use warnings; use Test::More tests=>1; my $file = \do{ "This \n is not the end of a line on windows\r\n" }; open my $fh1, '<:raw', $file; my $chars_read = length(<$fh1>); close $fh1; my $chars_expected=47; is( $chars_read, $chars_expected, 'record length' );` [download] OUTPUT: `1..1 not ok 1 - record length # Failed test 'record length' # at nl.pl line 15. # got: '6' # expected: '47' # Looks like you failed 1 test of 1.` [download] Unfortunately, my solution (use :crlf instead of :raw) does not work either. Bill	[reply] [d/l] [select]
Re^4: Dealing with files with differing line endings by Marshall (Canon) on Nov 11, 2021 at 21:19 UTC
Re^5: Dealing with files with differing line endings by afoken (Chancellor) on Nov 12, 2021 at 13:54 UTC
Some notes below your chosen depth have not been shown here
Re: Dealing with files with differing line endings by Anonymous Monk on Nov 06, 2021 at 14:34 UTC
PerlIO::eol has not been updated in a while, but the last time I tried it still worked, and it installs successfully under Perl 5.34.0.	[reply]


P is for Practical
	PerlMonks

Dealing with files with differing line endings

edit