Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: chomp() is confusing

by aufflick (Deacon)
on May 15, 2006 at 07:39 UTC ( #549391=note: print w/ replies, xml ) Need Help??


in reply to Why chomp() is not considering carriage-return

chomp does exactly what the other posters say - removes the current record separator at the end of the line where the record separator is stored in the special variable $/

That variable defaults to the EOL (end of line) character for your platform.

So if you run your script on a Unix type OS, only the newline will be removed (since it's expected that on that OS files will end in only a newline).

If you run your script on Windows, the carrage return and newline will both be removed (since it's expected that under Windows files will end in both a carriage return and a newline).

This means that when you chomp a line on a file made on the same OS as where your script is running, the correct thing will happen without your intervention.

If, on the other hand, you want to run a script that will strip the EOL marker off a file regardless of what OS the script is running on and whether that file came from Unix or Windows, you need to do that yourself, with something like:

{ local $/ = "\n"; for my $line (<FILEHANDLE>) { $line =~ s/\r?\n$//; # do your thing } }
This works easily due to the neat fact that \r\n and \n both end in \n so the <> operator is going to split Windows and Unix files into lines correctly and all you have to do is strip off the trailing \n and optionally a preceding \r.

Of course the other EOL possibility is that of a Classic MacOS text file, where the EOL marker is a lone carriage return (ie. \r). Mercifully you won't come across these files nearly as much as you used to and dealing with them is an exercise left to the reader ;)


Comment on Re: chomp() is confusing
Download Code
Re^2: chomp() is confusing
by jesuashok (Curate) on May 15, 2006 at 08:13 UTC
    hi aufflick

    I am not sure whether you have tested this program in Windows or not


    If you run your script on Windows, the carrage return and newline will both be removed (since it's expected that under Windows files will end in both a carriage return and a newline).

    I tested this program in windows, which is not behaving in the way you have explained.

    $one = "Hello\r\n"; print "Before chomp" , length($one) , "\n"; chomp($one); print "After Chomp" , length($one) , "\n";
    output:-
    Before chomp 7 After Chomp 6
    From that I understood that only the newline is removed in chomp, not the carriage return.

    "Keep pouring your ideas"
      Hmm, that doesn't sound right at all - is that perl under cygwin or with ActivState perl?

      If you want to chomp \r\n, set $/ to \r\n:

      local $/ = "\r\n"; $one = "Hello\r\n"; print "Before chomp: " , length($one) , "\n"; chomp($one); print "After Chomp: " , length($one) , "\n";
Re^2: chomp() is confusing
by ikegami (Pope) on May 23, 2006 at 16:49 UTC

    If you run your script on Windows, the carrage return and newline will both be removed (since it's expected that under Windows files will end in both a carriage return and a newline).

    That's completely wrong. Perl automatically converts CRLF to LF when reading text, so $/ is set to LF ("\n"). Your script shouldn't see CR at all, so chomp only removes the LF.

    Now, if you were dealing with binary data, chomp won't work (without changing $/) because chomp is a text function.

    print(unpack('H*', $/), "\n"); # 0a Just LF my $str = <DATA>; print(unpack('H*', $str), "\n"); # 746573740a No CR chomp($str); print(unpack('H*', $str), "\n"); # 74657374 LF chomped. __DATA__ test

    If, on the other hand, you want to run a script that will strip the EOL marker off a file regardless of what OS the script is running on and whether that file came from Unix or Windows, you need to do that yourself, with something like:

    That regexp is incomplete. It doesn't handle systems that use \r as the EOL marker.

      Hmm, you might be right about the conversion - All my windows perl runs under cygwin anyway so I don't really get to see that pain and my understanding may be flawed.

      I do have to stand my ground on

      "That regexp is incomplete. It doesn't handle systems that use \r as the EOL marker."
      however - which of the two operating systems I mentioned (UNIX and Windows) use \r as the EOL marker? In fact I specifically said in the next paragraph:

      Of course the other EOL possibility is that of a Classic MacOS text file, where the EOL marker is a lone carriage return (ie. \r). Mercifully you won't come across these files nearly as much as you used to and dealing with them is an exercise left to the reader ;)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://549391]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2014-12-25 02:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (159 votes), past polls