Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

How to remove a carriage return (\r\n)

by monkfan (Curate)
on Nov 01, 2005 at 16:11 UTC ( #504626=perlquestion: print w/ replies, xml ) Need Help??
monkfan has asked for the wisdom of the Perl Monks concerning the following question:

My fellow monks,
"chomp" doesn't seems to remove a carriage return. With this line:
perl -MData::Dumper -e ' $key = "test text\r\n"; chomp $key; print Dumper $key; '
It prints this:
$VAR1 = 'test text ';
How can we make it to print:
$VAR1 = 'test text';
Yes, I want the whitespace in between "test" and "text".

Regards,
Edward

Comment on How to remove a carriage return (\r\n)
Select or Download Code
Re: How to remove a carriage return (\r\n)
by borisz (Canon) on Nov 01, 2005 at 16:16 UTC
    look at $/ ( $INPUT_RECORD_SEPARATOR )
    perl -MData::Dumper -e ' $key = "test text\r\n"; local $/ = "\r\n"; chomp $key; print Dumper $key;
    Boris
Re: How to remove a carriage return (\r\n)
by philcrow (Priest) on Nov 01, 2005 at 16:16 UTC
    Chomp removes the line ending native to your platform, because $/ defaults to it. On unix this is simplly a line feed which we often write \n.

    If chomp won't do it try:

    $key =~ s/\r\n//;

    Phil

    Update: Explained why chomp usually removes your platform default line ending.

      I prefer  $key =~ s/\r?\n/ so it will handle either style of linefeed, whether your code is running on either Windows or Unix. (On Windows, if binmode is off, you get what look like Unix linefeeds out of Windows linefeeds.)
Re: How to remove a carriage return (\r\n)
by pg (Canon) on Nov 01, 2005 at 16:30 UTC

    To be a little bit cross-platform, just do:

    use strict; use warnings; { my $str = "abcd\r\n"; $str =~ s/\r|\n//g; print "[$str]"; } { my $str = "abcd\n"; $str =~ s/\r|\n//g; print "[$str]"; } { my $str = "abcd\r"; $str =~ s/\r|\n//g; print "[$str]"; }
      But what happens when $str = "ab\ncd\r\n"? Or can we assume there are no line breaks except at the end of lines?
        $line =~ s/\R//g;
Re: How to remove a carriage return (\r\n)
by radiantmatrix (Parson) on Nov 01, 2005 at 17:05 UTC

    The chomp function removes the current 'input record separator' (stored in $/, see perlvar) from the end of a given string of text. You have two options to make it behave in the given circumstance. I will assume, from here, that you are reading lines from a file with Windows-style line-endings (\r\n).

    First, you could simply adjust your $/:

    local $/ = "\r\n"; while (<DATA>) { chomp $_; print STDERR "'$_'\n"; }

    Of course, if you don't know what kind of line endings you have, you can simply convert all line endings to newlines first:

    while (<DATA>) { s/\r[\n]*/\n/gm; # now, an \r (Mac) or \r\n (Win) becomes \n (UNIX +) chomp $_; print STDERR "'$_'\n"; }

    Seems a waste to run a regex and chomp, but you could remove chomp from the code above, and replace the regex with:

    s/\r[\n]*//gm;

    Of course, if your input separator isn't set, you'll read the whole file on the first pass through the while loop. You might consider an alternate strategy:

    open FH, '<', $file or die "Can't read '$file': $!"; # find out what kind of line endings we have my $buffer; local $/ = undef; while ( read( FH, $buffer, 1024 ) ) { if ( $buffer=~m/(\r[\n]*)/s ) { $/ = $1; # set the input separator to what we found last; # stop trying to find the separator } } close FH; # now reopen the FH and read line by line open FH, '<', $file or die "Can't read '$file': $!"; while (<FH>) { chomp; print STDERR "'$_'\n"; } close FH;

    There are cases I haven't dealt with, etc. for purposes of simplicity.

    <-radiant.matrix->
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    "In any sufficiently large group of people, most are idiots" - Kaa's Law
Re: How to remove a carriage return (\r\n)
by jira0004 (Monk) on Nov 01, 2005 at 18:17 UTC

    Hi,

    It looks like you've gotten plenty of responses to your question, but as already mentioned chomp will remove the platform native line delimiter (0x0a on UNIX, 0x0d 0x0a on Windows).

    If $line contains a line from your file and you want to remove either a UNIX line terminator or a Windows line terminator from the end of $line, you could do the following:

        $line =~ s/\x0d{0,1}\x0a\Z//s;

    The Perl syntax of =~ s/<regular expression>/<replacement>/<qualifiers> causes occurrence(s) of <regular expression> to be replaced by <replacement> and the <qualifiers> indicate how that replacement should be performed. \x followed by two hexi-decimal digits matches the byte in $line whose value is the given set of hexi-decimal digits -- 0d is the hex-decimal value for carriage return and 0a is the hexi-decimal value for newline (line-feed). Open curly brace '{', digit, comma, digit, close curly brace '}' indicates the maximum and minimum number of times to match the preceeding character \x0d{0,1} will match carriage return 0 times or one time. Regular expression pattern matching is always greedy (maximal) so it will match as many times as it can, thus if it can match \x0d, then it will, but if there is no \x0d, that's okay ({0, makes the match optional). \x0a matches the newline (line-feed) character. \Z matches the end of the string (when the s qualifier is used $ at the end of the regular expression and \Z at the end of the regular expression both match the end of the string, where as if the m qualifier is used, then \Z matches the absolute end of the string while $ matches any platform native line terminators within the given string). The s qualifier is used in this case to tell Perl to treat the contents of $line as all one string even if it contains newline characters.

    Thus, $line =~ s/\x0d{0,1}\x0a\Z//s; will remove one line terminator from the end of $line and it won't matter if it is a UNIX line terminator or a Windows line terminator. Note that on Macintosh the line terminator is \x0d. So you would need something like this:

        $line =~ s/\x0d{0,1}\x0a{0,1}\Z//s;

    This substitution would strip off the line terminators in a UNIX file, a Windows file or an old Macintosh file.

    Note that in substution you can use \s, \s matches the space character, the tab character, carraige return or line feed.

    Thus, I usually use the following:

        $line =~ s/\A\s+//s;
        $line =~ s/\s+\Z//s;

    Which strips all of the whitespace charactes from the begining and the ending of $line. Note again that this pattern would remove all whitespace characters from the beginning and ending of $line which may or may not be what you want. I usually ignore whitespace at the start or end of a line because it usually isn't useful.

    Regards,

    Peter Jirak

    jira0004@yahoo.com

Re: How to remove a carriage return (\r\n)
by monarch (Priest) on Nov 02, 2005 at 01:06 UTC
    Being a paranoid programmer myself, I always use:
    sub remove_trailing_newline { $_[0] =~ s/[\r\n]+\Z//; }
    no matter which platform I am on. A real trick is trying to find blank lines in a slab of (multiline) text:
    if ( m/(\r\n|\n\r|\r|\n)$1/ ) { # two newlines in a row! }

    Update: Thanks to rev_1318 for pointing out that the $1 need be replaced with \1 in the regexp.

      if ( m/(\r\n|\n\r|\r|\n)$1/ ) {
      You mean
      if ( m/(\r\n|\n\r|\r|\n)\1/ ) {
      Backreferences inside the RE are notated as \1, \2, etc.

      Paul

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://504626]
Approved by gryphon
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (8)
As of 2014-07-28 06:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (189 votes), past polls