Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Why chomp() is not considering carriage-return

by jesuashok (Curate)
on May 15, 2006 at 06:45 UTC ( #549385=perlquestion: print w/ replies, xml ) Need Help??
jesuashok has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

When I had a theoretical study on perl ( even in other languages too ) I studied that, "\n" is the combination of "\r\n".
Recently we had written a program which should run successfully on all platforms. so we use "\r\n" as the combination for new line handling.
but I found an issue when I use chomp with this type of strings ("\r\n").
Should chomp takecare this type strings ? correct me If I am wrong.
could anyone give me an idea to solve this issue ?

#!/bin/perl use strict; my $one = "Hello\r\n"; chomp($one); print ":$one:";

"Keep pouring your ideas"

Comment on Why chomp() is not considering carriage-return
Download Code
Re: chomp() is confusing
by TedPride (Priest) on May 15, 2006 at 06:51 UTC
    Chomp removes the current "input record separator", which can be changed by editing one of the built-in perl variables:
    $/ = "\r\n";
    Or you can just use regex to convert the line endings, which is probably safer if you don't know for sure what they're going to be:
    for ("\r", "\n", "\r\n", "\n\r", "\r\r", "\n\n") { $line = 'text'.$_."text"; $line =~ s/$1/\n/g if m/(\r\n?|\n\r?)/; print "$line\n"; }
    This basically detects what the line endings are from the first line ending, then converts everything to a standard \n.
Re: chomp() is confusing
by davido (Archbishop) on May 15, 2006 at 06:56 UTC

    chomp removes any trailing string that corresponds to the contents of the special variable $/

    You shouldn't, under most circumstances, need to worry whether a specific platform uses \r\n or \n. Perl generally uses \n as the logical newline, and adapts to whatever platform it's being used on. You can read up on this topic in perlport.

    About the only times you have to worry about cross-platform line ending issues are when you're reading documents written on one platform, from another platform; and also when you've moved a script from one platform to another without properly converting the script's own line endings.


    Dave

Re: chomp() is confusing
by aufflick (Deacon) on May 15, 2006 at 07:39 UTC
    chomp does exactly what the other posters say - removes the current record separator at the end of the line where the record separator is stored in the special variable $/

    That variable defaults to the EOL (end of line) character for your platform.

    So if you run your script on a Unix type OS, only the newline will be removed (since it's expected that on that OS files will end in only a newline).

    If you run your script on Windows, the carrage return and newline will both be removed (since it's expected that under Windows files will end in both a carriage return and a newline).

    This means that when you chomp a line on a file made on the same OS as where your script is running, the correct thing will happen without your intervention.

    If, on the other hand, you want to run a script that will strip the EOL marker off a file regardless of what OS the script is running on and whether that file came from Unix or Windows, you need to do that yourself, with something like:

    { local $/ = "\n"; for my $line (<FILEHANDLE>) { $line =~ s/\r?\n$//; # do your thing } }
    This works easily due to the neat fact that \r\n and \n both end in \n so the <> operator is going to split Windows and Unix files into lines correctly and all you have to do is strip off the trailing \n and optionally a preceding \r.

    Of course the other EOL possibility is that of a Classic MacOS text file, where the EOL marker is a lone carriage return (ie. \r). Mercifully you won't come across these files nearly as much as you used to and dealing with them is an exercise left to the reader ;)

      hi aufflick

      I am not sure whether you have tested this program in Windows or not


      If you run your script on Windows, the carrage return and newline will both be removed (since it's expected that under Windows files will end in both a carriage return and a newline).

      I tested this program in windows, which is not behaving in the way you have explained.

      $one = "Hello\r\n"; print "Before chomp" , length($one) , "\n"; chomp($one); print "After Chomp" , length($one) , "\n";
      output:-
      Before chomp 7 After Chomp 6
      From that I understood that only the newline is removed in chomp, not the carriage return.

      "Keep pouring your ideas"
        Hmm, that doesn't sound right at all - is that perl under cygwin or with ActivState perl?

        If you want to chomp \r\n, set $/ to \r\n:

        local $/ = "\r\n"; $one = "Hello\r\n"; print "Before chomp: " , length($one) , "\n"; chomp($one); print "After Chomp: " , length($one) , "\n";

      If you run your script on Windows, the carrage return and newline will both be removed (since it's expected that under Windows files will end in both a carriage return and a newline).

      That's completely wrong. Perl automatically converts CRLF to LF when reading text, so $/ is set to LF ("\n"). Your script shouldn't see CR at all, so chomp only removes the LF.

      Now, if you were dealing with binary data, chomp won't work (without changing $/) because chomp is a text function.

      print(unpack('H*', $/), "\n"); # 0a Just LF my $str = <DATA>; print(unpack('H*', $str), "\n"); # 746573740a No CR chomp($str); print(unpack('H*', $str), "\n"); # 74657374 LF chomped. __DATA__ test

      If, on the other hand, you want to run a script that will strip the EOL marker off a file regardless of what OS the script is running on and whether that file came from Unix or Windows, you need to do that yourself, with something like:

      That regexp is incomplete. It doesn't handle systems that use \r as the EOL marker.

        Hmm, you might be right about the conversion - All my windows perl runs under cygwin anyway so I don't really get to see that pain and my understanding may be flawed.

        I do have to stand my ground on

        "That regexp is incomplete. It doesn't handle systems that use \r as the EOL marker."
        however - which of the two operating systems I mentioned (UNIX and Windows) use \r as the EOL marker? In fact I specifically said in the next paragraph:

        Of course the other EOL possibility is that of a Classic MacOS text file, where the EOL marker is a lone carriage return (ie. \r). Mercifully you won't come across these files nearly as much as you used to and dealing with them is an exercise left to the reader ;)

Re: chomp() is confusing
by davorg (Chancellor) on May 15, 2006 at 08:21 UTC

    In addition to the excellent advice given by other monks above, you might also find it useful to read the section on newlines in perldoc perlport.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      Of course it depends on what you know about your input data: for most purposes s/[\r\n]+// will do the job as long as you can guarantee no \r or \n embedded in the lines. Monks might object that s/[\r\n]+$// is faster, being anchored, however reading a Windows file on a Mac there may be a possibility of splitting the \n onto the next line - I couldn't say.
        As of today using Strawberry Perl 5.16.0.1 (64bit), Perl doesn't automatically adapt chomp at all to the platform nor converts automatically the carriage return from files that are read. For me, it always just remove \n, and that's all. My solution: either change the special var as suggested above, or use your own chomp with a substitution regexp: s/(\n|\r)//g Or of course, change the carriage return of all your files.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://549385]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2014-12-25 16:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (160 votes), past polls