http://www.perlmonks.org?node_id=183567
Category: Utility Scripts
Author/Contact Info Stephen JamesAckohno@mail.ru
Description: This is a little script i came up with to get those ^M's out of files that come in the downloaded source here at PerlMonks. Given one argument (a file name), the script removes the ^M's from that file; given two, the first is input and second is output. If the if statment matching for the perl shebang is removed, this script can be used to remove the ^M's from any file. Without that if statment, there may be a new line at the begining of the file witch will cause the script not to run.
#!/usr/bin/perl -w 

use strict;
my $out;

if(@ARGV!=1 && @ARGV!=2){
    print "Usage:\n\t$0 input [output];\n";
    exit;
}

open(IN, "<$ARGV[0]") or die "couldn't open $ARGV[0]: $!";

if(<IN>=~m/(^#!.*perl.*)/){
    $out.=$1;
}

while(<IN>){
    chomp;
    chop;
    $_.="\n";
    $out.=$_;
}

close IN;    

if(@ARGV==1){
    unlink $ARGV[0];
    open(OUT,">$ARGV[0]") or die "couldn't open $ARGV[0]: $!";
}
elsif(@ARGV==2){
    open(OUT,">$ARGV[1]") or die "couldn't open $ARGV[1]: $!";
}

print OUT $out;
close OUT;
Replies are listed 'Best First'.
Re: win2unix
by atcroft (Abbot) on Jul 20, 2002 at 08:08 UTC
    perl -p -i.bak -e "s/\r//" filename loops thru filename, removing the ^M characters (\r), and creates a backup with .bak appended to the filename-a commonly used one-liner to do the same thing. (.bak could be replaced with something else, or left off if no backup is desired)

    Update: No need to feel embarassed-I would be willing to bet that most of us have written a script to do that same thing before. Appreciate the fact that you looked at a problem and came up with a good solution which dealt relatively effectively with contingencies, and chalk it up to practice in growing in your abilities to use perl effectively. Reinventing the wheel isn't always bad, if you gain a better understanding of that wheel and don't try to reinvent the same one too often.

      ::slams head on keyboard::
      talk about embarrasing...
Re: win2unix
by Ionitor (Scribe) on Jul 20, 2002 at 17:33 UTC
    Just a few comments on the code:
    if(<IN>=~m/(^#!+?perl+?)/){ $out.=$1; }

    First, this appears to try to remove the first line if it isn't a perl shebang line. If you're trying to remove a blank line at the beginning, testing for whitespace would probably be easier and safer. However, if you really want to discard any 1st line that isn't a perl shebang, there's a couple problems with your regex. The pattern +? causes minimal matching of at least one character, where the character matched is immediately before the +?. In other words, here are a few things your regex will match:

    #!perl #!!!!!!!perlllllll #!perl -w
    In the last case, #!perl is the only thing captured by the parens in the regex. Note that your code will not match #!/usr/bin/perl.

    What you probably want is:

    if(<IN>=~m/(^#!.*?perl.+)/){ $out.="$1\n"; }

    This way, your code will match and capture #!perl (used often in Windows programs, esp. by Activestate), #!/usr/bin/perl, and #!/usr/bin/perl -w. Additionally, the new line that the period does not capture will be added back on, without a ^M.

    Update:
    That should be:

    if(<IN>=~m/(^#!.*?perl.*)/){ $out.="$1\n";

    Forgot to replace the other "+".

      Thanks for calling my attention to that, I didn't notice...
      Your's didn't exactly work that well, but I came up with something that does: m/(^#!.*perl.*)/ this matches just about everything i could think of just now. Sure, it still matches things like #!apflperladsf, but who cares :o).
Re: win2unix
by Courage (Parson) on Jul 20, 2002 at 08:33 UTC
    seems to me like your code snippet will damage a file where ^M are already missing: chomp will remove \n and chop next to it will eat a good character.

    Secondly, it will not work on Win32, because you did not used "binmode" on that filehandle.

    Additionally I just thought about not important but funny thing: you closed a filehandle because it is a good practice (perldoc says that it's not needed, because perl does this for you).
    Let's go further and do even more: let's undefine all defined wariables, return to initial directory, and so on...
    :)

    Courage, the Cowardly Dog.

      Closing files is probably still a good habbit to get into. Consider:
      That it documents that you won't be messing with a file anymore.
      And if you use flock(), and are into the habbit of NOT closing your files, you could end up with a messy unintentional bottleneck
      A reply falls below the community's threshold of quality. You may see it by logging in.
      My experience with files that are mushed up with the ^M's has been that there is one on every line in the file, so naturaly, I didn't realize there could or might be a line without the ^M.

      Whould removeing ^Ms really be useful in windows?

      Hehe, I come from C and C++. Not closeing filehandles is of the devil >:)-|<.