Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Remove the ^M Character from a Document

by chromatic (Archbishop)
on Jan 29, 2000 at 04:29 UTC ( #2586=snippet: print w/ replies, xml ) Need Help??

Description: If you transfer text files back and forth between Windows and Unix, you may notice a strange ^M character showing up here and there. There's an easy way to get rid of it, though. Try the first one liner. In a Unix environment, you can remove the \r character (as EOL for Unix is \n) with the second one liner.
perl -pi -e 'tr/\cM//d;'  <filename>

perl -pi -e 'tr/\r//d';  <filename>
Comment on Remove the ^M Character from a Document
Download Code
RE: Remove the ^M Character from a Document
by Anonymous Monk on Mar 17, 2000 at 20:08 UTC
    hmm... it didn't work for me !!

      I don't think it would work for anyone. You need to put the -e right before the Perl code. As is, it tries to execute the Perl code "-pi" with @ARGV set to ('tr/\r//d').

      This works for me. $string =~ s/(\n|\r|\x0d)//g;

      If you want to do this from Windows (rather than Unix), it seems to be very hard to stop perl wanting to convert anything that looks like ascii 10 back to ascii 10 + ascii 13.

      The one way I've managed to do it is to put the file handle into binmode. aka:

      binmode STDOUT; while (<>) { s/\n//; print "$_" . chr(10); }

      Possibly there is a variable that controls this but I haven't found it and things like $OFS in the program and -l012 on the command line don't seem to help (in perl 5.16). Possibly someone might to look into this in more detail.

RE: Remove the ^M Character from a Document
by Anonymous Monk on Mar 23, 2000 at 00:12 UTC
    If you're in a Unix environment (where \n is the EOL char) you can just as easily do: perl -pi -e 's/\r//g' <file name> This works because in DOS, EOL is represented at \r\n.
      Not sure why the ones mentioned above does not work. but yours do :D thanz. Wiseness does not come with age, but
      with the mind to realise...
      What if you are in a DOS environment and you want to remove what will become the offending ^M when the file is opened in Unix? Running the search and replace doesn't do anything. You still end up with the carriage return instead of the linefeed
RE: Remove the ^M Character from a Document
by ZZamboni (Curate) on May 08, 2000 at 18:13 UTC
    On a similar note, the other day I had a postscript file with an embedded pixmap graphic which contained line breaks represented only as "\r", in addition to the usual DOS "\r\n" at the end of each line. The graphic with the \r's came up, in Unix, as a single over-900K line, which broke most programs I tried to use to manipulate the file (those programs were not written in Perl, clearly :-)

    So based on this snippet, I came up with the following one-liner:

    perl -pi -e 's/\r\n?/\n/g' <i>file</i>
    which solved my problem.
      You might want to try a different substitution character, to lessen the obfuscation on this syntax, such as:
      perl -pi.orig -e 's#\r\n#\n#g' filespec
      or
      perl -pi.orig -e 's,\r\n,\n,g' filespec
      Though ideally, this is more correct:
      perl -pi.orig -e 's,\cM,,g' filespec # commas for clarity
        The use of the forward slash to delimited regular expressions and replacements has a history of decades - predating the birth of Perl by years. There are no forward slashes in the regular expression that could cause confusion. So, other than a fear of forward slashes, what makes you think use of a forward slash contributes to obfuscation?

        Abigail

RE: Remove the ^M Character from a Document
by muppetBoy (Pilgrim) on May 11, 2000 at 16:12 UTC
    If you are trying to do this substitution from the command line you could just use dos2unix/unix2dos. (on Unix box - I think the commands are available on all flavours(?))

      if you need to strip it from all the *.html files in a directory, try this sh snippet:

      for A in *.html; do if [ -f $A ]; then sed -e 's/^V^M//g' $A > /tmp/foo.$$; mv /tmp/foo.$$ $A; fi;done

      it's not perl, so sue me :-P

      edit by mirod: added code tags

RE: Remove the ^M Character from a Document
by le (Friar) on Jun 06, 2000 at 16:42 UTC
    I have one alternative left:

    the character combination ^M is created by hitting Ctrl-v Ctrl-m.
    so typing:
    perl -pi -e 's/Ctrl-v Ctrl-m//g' filename
    will replace the annoying ^M's too.

    Remember: Ctrl-v Ctrl-m is a key combination, not literal text.
Re: Remove the ^M Character from a Document
by thaigrrl (Monk) on Jan 04, 2001 at 23:15 UTC
    Or... in solaris you can do:
    dos2unix <filename> <newfilename>
      You can get a dos2unix (and unix2dos) for any flavor of UNIX/BSD.

      Cheers,
      KM

      The problem is that dos2unix removes ^M, while I often have to replace them by spaces (in the SGML exported by FrameMaker for example), so I end up doing:

      perl -pi e's{\r}{ }g;'
Re: Remove the ^M Character from a Document
by KM (Priest) on Jan 04, 2001 at 23:20 UTC
    For those who use this for S&R beyond the scope of Ctrl chars (like words, sentences, etc...), using
    perl -pi.bak -e 's!something!something else!' file
    is helpful so you can have a backup of your file in case something occurs which you didn't expect. Refer to perlrun.

    Cheers,
    KM

Simple Way...
by Anonymous Monk on Jan 15, 2002 at 00:44 UTC
    vi the file and white in command mode type in:

    :%s/[ctrl+v][ctrl+m]//g

    hitting the ctrl+v makes the carrot and ctrl+m specifies the letter "m"

    basically searches and replaces all ^M with nothing
Re: Remove the ^M Character from a Document
by Anonymous Monk on Feb 22, 2003 at 14:28 UTC
    February 22, 2003: It worked for me... however, I was executing from a web-based .PL script call and not a native command line. -Postmaster, www.churchsermon.org
Re: Remove the ^M Character from a Document
by umasuresh (Hermit) on Oct 04, 2010 at 16:00 UTC
    Along the same lines:
    I often get these strange characters ^[[00m when I save the list command into an output file (  ls *.txt > list ) which is visible only in vi.
    I am aware that these characters appear due to  alias ls='ls --color' option in my .bashrc file. I don't want to unalias ls in each window.
    Is there a similar solution for fixing this?

Back to Snippets Section

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://2586]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (7)
As of 2014-12-21 20:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (107 votes), past polls