Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Reading text file with <CR> line endings

by Marshall (Abbot)
on Mar 02, 2018 at 18:15 UTC ( #1210242=perlquestion: print w/replies, xml ) Need Help??
Marshall has asked for the wisdom of the Perl Monks concerning the following question:

One of my users had problems with an input text file. Windows printed the file ok from the command line. However, evidently the file had <CR> line endings (old Mac convention). With following code, I just got a few characters + some garbage from near the end of a few hundred line file:
open my $in, '<', "file" or die "$!"; #open worked while (my $line = <$in>){print $line;} #just a few characters print
I am running on Win 10, Active State Perl (v5.24.3). Above code will work with <LF> (Unix) or <CR><LF> (Windows) line endings but evidently not with <CR> line endings. My user has changed his workflow to only make <CR><LF> files and so things are fine moving forward. However, I am curious. I remember previous versions of Perl working fine when reading <CR> terminated lines. Any ideas?

Update fixed file handle typing mistake. More of actual code shown later in thread.

Update: Thanks for the interesting input, especially from LanX and haukex. Fortunately I don't have to solve the general problem at the moment (read text file with unknown line endings). When I wrote the code, I implemented it in such a way that Unix or Windows endings work fine for input and all files that I generate have pure line endings of the platform the code was run on. This <CR> file was a "weird duck". I spent some more time talking with my user and found out that he generated the <CR> file on a Windows platform. But had his text editor set to use <CR> line endings instead of Windows standard <CR><LF>! We agreed that this is "outside of spec" and "don't do that". Since the user symptom was "program didn't do anything" (no valid input records read), I've added a simple print of number of records processed. Program will exit with non-zero error code if that number is zero just in case this utility is ever used in a batch file. So things are fine now and I incrementally learned a bit.

Replies are listed 'Best First'.
Re: Reading text file with <CR> line endings
by haukex (Abbot) on Mar 02, 2018 at 18:49 UTC

    I don't think print is the appropriate debugging tool here...

    $ hexdump -C foo.txt 00000000 46 6f 6f 0d 42 61 72 0d 51 75 7a 0d |Foo.Bar.Q +uz.| 0000000c $ cat read.pl #!/usr/bin/env perl use warnings; use strict; use Data::Dumper; $Data::Dumper::Useqq=1; open my $fh, '<', 'foo.txt' or die $!; print Dumper([PerlIO::get_layers($fh)]) unless $] lt '5.008'; while (<$fh>) { print Dumper($_); } close $fh;

    Output on Win 7 Strawberry Perl 5.10, 5.14, 5.20, and 5.26:

    $VAR1 = [ "unix", "crlf" ]; $VAR1 = "Foo\rBar\rQuz\r";

    And on Linux, the output is as follows:

    # 5.6.2: $VAR1 = "Foo\rBar\rQuz\r"; # 5.8.1 and 5.8.9: $VAR1 = [ "stdio" ]; $VAR1 = "Foo\rBar\rQuz\r"; # 5.10.1 thru 5.26: $VAR1 = [ "unix", "perlio" ]; $VAR1 = "Foo\rBar\rQuz\r";

    So none of the tested versions handle a CR line ending. See also open: the default layers are :raw on Unix and :crlf on Windows. For details see also PerlIO and open (the pragma). (Update: and LanX provided another excellent link with Newlines in perlport)

    If the file is plain ASCII, this works across all of the above configurations:

    use warnings; use strict; open my $fh, '<', 'foo.txt' or die $!; binmode $fh; $/="\x0D"; while (<$fh>) { chomp; ... } close $fh;
Re: Reading text file with <CR> line endings
by Laurent_R (Canon) on Mar 02, 2018 at 18:45 UTC
    I remember previous versions of Perl working fine when reading <CR> terminated lines.
    I'm a bit surprised. In my recollection, Perl is handling gracefully line endings for files prepared with the format of the OS on which Perl is running, but can't handle correctly cross-platform files. To tell the truth, I am very much used with these types of problems between Windows and Unix or Linux, much less with old Mac endings (especially nowadays).

    I think that the while statement correctly splits the lines, but it is just the printing which goes over the same line again and again due to the carriage return without any line feed. Try this:

    while (my $line = <$in>) { $line =~ s/[\r\n]+$//; print "$line\n";}
Re: Reading text file with <CR> line endings
by LanX (Bishop) on Mar 02, 2018 at 18:26 UTC
      I've got a bit of an issue because in my fiddling with this thing, I over-wrote the file. I can get it back and try chomp(), but my regex should be even better? Here is more of the actual code... #lines were used in my testing...
      while (my $line = <IN>) { # print "line=$line\n"; next if $line =~/^\s*\w+:/; # The Run_Time: line next if $line =~/^\s*$/; # A blank line $line =~ s/\s*$//; # No trailing spaces or line ending $line =~ s/^\s*//; # No leading space next if $line =~ /^Call/; # Skip the CSV header line my($call,$bracket) = (split /\|/,$line)[0,-1]; # print "$call, $bracket\n"; #### nothing printed here either # more processing ... not shown.. }
Re: Reading text file with <CR> line endings
by thanos1983 (Vicar) on Mar 02, 2018 at 18:47 UTC

    Hello Marshall,

    This should do the work for you:

    #!/usr/bin/perl use strict; use warnings; use feature 'say'; sub myChmop { my ($str) = @_; chomp $str; return $str; } sub myCrRemove { my ($str) = @_; $str =~ s/\r|\n//g; return $str; } my $str = "abcd\r\n"; my $chompReturn = myChmop($str); say "[$chompReturn]"; my $crRemoveReturn = myCrRemove($str); say "[$crRemoveReturn]"; __END__ $ perl test.pl ]abcd [abcd]

    Also read this "similar question" Carrige Return and Line Feed in Perl..

    Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Reading text file with <CR> line endings
by karlgoethebier (Monsignor) on Mar 03, 2018 at 10:00 UTC

    This untested fragment should work:

    #!/usr/bin/env perl use warnings; use strict; use open IO => ':bytes'; use autodie; open my $fh, '<', 'foo.txt'; $/ = "\x0D"; while (<$fh>) { chomp; ...; } close $fh; __END__

    I don't remember this IO layer stuff very well. Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: Reading text file with <CR> line endings
by Anonymous Monk on Mar 05, 2018 at 14:14 UTC
    If the file is large and the line-ending is truly unpredictable, you can read (say) the first 1,000 bytes of the file and then search for the various separators – starting of course with the two-byte ones first. Use this to set the Perl record-separator variable. (And if you don't find any of the expected sequences, die.) Remember to seek back to the start of the file before proceeding.
      Yes, that approach would work. When I considered that idea, I was thinking of opening the file in bin mode. Read a block of data. Use substr index to find first <CR> and then run substr index to find first <LF>, from those index numbers I could figure out what kind of file it was. I'd probably close file and report back the line ending value from a "discover_line_ending sub". And have main program re-open the file in text mode with correct input line separator. BTW, substrindex is a very fast critter because it is "stupid" - searching to the end of the input block with no search value found is "fast".

      Another idea I considered was to slurp in the whole file as binary - the max size of this file is small enough to do that. Then use split to do my own line division. The input line separator cannot be a regex, but split can use a regex.

      As I mention in my now updated question post, I found out where this "weird duck" file came from. There is no need for me to solve this tricky problem at this time. Users do the darndest things! The "hey, don't do that!" answer appears to meet all of my requirements and user is fine with that "solution". Not every complicated problem requires an actual implementation. At some point in the future, I may have to actually implement a solution for this problem and what I've learned in this thread will be helpful. I think there are also fine points in this thread that could be useful for other problems. In my opinion, you can never "know too much" about Perl.

        substr doesn't search at all. I suppose you meant index?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1210242]
Approved by marto
Front-paged by Discipulus
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (8)
As of 2018-04-23 08:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?