Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Comment Stripper script for unix

by hsinclai (Deacon)
on Jun 14, 2004 at 01:55 UTC ( #366388=sourcecode: print w/ replies, xml ) Need Help??

Category: Utility Scripts
Author/Contact Info devel @hastek.com
Description: e.pl
invoke as "e" or "ee"
Comment stripper for unix, useful during system administration. Removes blank lines, writes output file, strips "#" or ";". Tries to preserve shell scripts.
Please see the POD
#!/usr/bin/perl -w

#     e.pl   (invoke as e or ee)
#            Please see the POD for install and licensing details

use strict;

###### globals
my $version = "0.9";
my $comm;
my @stripped;
my $topline;


######  how we were called
chomp(my $us = qx!basename $0!);
if ( $us eq "ee" ) { $comm = ';'; } else { $comm = '#'; }


######  parse args
$#ARGV >= 2 && die("\n No more than 2 arguments\n\n"); 
defined $ARGV[0] || die(&usage($us));
my $ifile=$ARGV[0];
-e $ifile || die("\n Input file nonexistent.\n\n");

open(IFIL,"<$ifile") or die("problem opening input_file");
my @inputfile=<IFIL>;
close(IFIL); 


######  main
if ( $us eq "ee" ) {
   $topline = shift(@inputfile);
   die(&pwarn($comm)) if $topline =~ /\#\!.*perl/i ;
   unshift(@inputfile,$topline);
   &stripper(@inputfile);
} elsif ( $us eq "e" ) {
     $topline = shift(@inputfile);
     if ( $topline =~ /(\s+)\#\!/ ) {
        &stripper(@inputfile);
        unshift(@stripped,$topline);
       } else {
        unshift(@inputfile,$topline);
        &stripper(@inputfile);
     }
  } 


######  final output
if ( $ARGV[1] ) {
    open(OFIL,">$ARGV[1]") or die("problem creating output_file"); 
    for ( @stripped ) { print OFIL "$_\n"; }
    print "\n Done stripping $ifile\n     -\>  wrote output file \"$AR
+GV[1]\"\n\n";
    close(OFIL);
} else {
    for ( @stripped ) { print "$_\n"; }
  }
exit $?;




######  subs

sub stripper {
    for ( @_ ) {
        chomp;
        next if /^$comm|^(\s*)$comm|^(\s*)$/;
        $_ =~ s/$comm.*$//;
        push(@stripped,$_);
    }
    return @stripped;
}

sub usage {
 print qq[
   Usage:   e filename [outputfilename]
            ииииииииииииииииииииииииииииииииииииииииииииииииииииииииии
+ииииии
            e strips comments and blank lines from an existing file.
            e to remove # comments, and ee to strip ; comments.
            
            See "perldoc e.pl"
            ииииииииииииииииииииииииииииииииииииииииииииииииииииииииии
+ииииии
            e.pl v$version                                        invo
+ked as \'$us\'

]; 
exit(1);
}

sub pwarn {
 print  qq[
 WARNING:   Input file "$ifile" looks like a Perl script
            
            The first line was:   $topline
            When invoked as \'$us\', e.pl strips out semicolons,
            which might not be very useful for looking at a Perl scrip
+t.
            If this assumption is wrong, remove the first line tempora
+rily.


];
&usage;
exit(1);
}


__END__

=head1 NAME


e (and ee), symbolic links to e.pl



=head1 VERSION


Version 0.9



=head1 SYNOPSIS


 e   (e.pl, to be invoked as either "e" or "ee")

 e   args
ee   args




=head1 DESCRIPTION


B<e> (invoked as "e" or "ee") is a small program to strip unix style c
+omments ( e.g., "#" or ";" ) from scripts and configuration files. It
+ might be
 useful during system administration. It is called "e" simply for brev
+ity.

B<e> also removes blank lines, makes some effort not to destroy shell 
+scripts and shebangs, and tries to avoid mangling Perl scripts it enc
+ounters.

B<e> is meant to be run on Unix systems where #, #!, and ; are common 
+comments/patterns.

B<e> requires at least one argument, a filename to be processed.

B<e> tries to detect if the first line of the input file contains the 
+#! character sequence, and tries to preserve it, assuming it might be
+ a shell 
script.

B<e> will stop and warn you about removing semi-colons from a file it 
+thinks is a Perl script.




=head1 INSTALLATION


Install the main file, e.pl, somewhere in your path, then in the same 
+directory, do

  ln -s e.pl e
  ln -s e.pl ee

Use e or ee, depending on what character you want to strip.

Invoking e.pl directly breaks it.

If you already have an e or ee on your system, you may use other symbo
+lic links,
If you rename these files, you will have to adjust the main script acc
+ordingly.


=head1 EXAMPLES


=over 4

=item B<e> I<input_filename>

Strips # comments and blank lines out of "filename" and sends the resu
+lt to your screen.



=item B<e> I<input_filename> [I<output_filename>] 

Same as above, but the result will be written to a new file "output_fi
+lename" in the current directory.


=item B<ee> I<input_filename> [I<output_filename>] 

Same as above, but semicolon as the comment character.

=back



=head1 BUGS

Might not be able to preserve the shebang line in a shell script, when
+ the shebang line is preceded by one or more blank lines.



=head1 LIMITATIONS

Does not remove C style comments.

Inefficiently written, so uses lots of memory when input files get lar
+ger.

Cannot detect a "here" document, and will happily destroy the contents
+ of one when it encounters a comment character somewhere in there.


=head1 AUTHOR

Harold Sinclair
devel at hastek


=head1 COPYRIGHT

Copyright Е2004 hastek. All rights reserved.

This program is free software; you can redistribute it and/or modify i
+t under the same terms as Perl itself.


=cut

#EOF

Comment on Comment Stripper script for unix
Download Code
Re: Comment Stripper script for unix
by Zaxo (Archbishop) on Jun 14, 2004 at 02:49 UTC

    I tried applying this script to itself. That was to check if significant uses of '#' were handled properly. The results were, uhhh . . . unfortunate.

    1. It stripped the shebang line, which doesn't look exotic at all.
    2. It did
      -if ( $us eq "ee" ) { $comm = ';'; } else { $comm = '#'; } +if ( $us eq "ee" ) { $comm = ';'; } else { $comm = '
      leaving an unclosed quote in the code.
    3. It did
      - die(&pwarn($comm)) if $topline =~ /\#\!.*perl/i ; + die(&pwarn($comm)) if $topline =~ /\
      leaving an open regex match.
    4. It did
      - if ( $topline =~ /(\s+)\#\!/ ) { + if ( $topline =~ /(\s+)\
      to the same effect.

    I think your e can only be applied in the simplest circumstances.

    Don't feel too bad, the saying goes, "Only perl can parse Perl." To do this sort of thing properly really does require a parser.

    After Compline,
    Zaxo

      Don't feel too bad, the saying goes, "Only perl can parse Perl." To do this sort of thing properly really does require a parser.
      ... or take a look at perltidy, which does a really good job on perl code formatting and also has a switch for stripping comments.
      Whoa - that's terrible - obviously I didn't test it with Perl scripts enough - I only used it with config files and shell scripts really - way too hasty ...

      This plain doesn't work and should be removed from the code catacombs - you all are too kind! Or maybe moved to the "don't let this happen to you" section?

      I didn't know Perltidy removed comments, so thanks for that eserte.





Re: Comment Stripper script for unix
by Abigail-II (Bishop) on Jun 14, 2004 at 15:06 UTC
    Input:
    #!/bin/bash # This is a comment. echo "# This is not a comment" echo \# and neither is this.
    Output:
    echo " echo \
    Your program will strip she-bang lines unless such a line starts with whitespace. However, whitespace isn't optional. The first 2 bytes of the file need to be #!, the kernel isn't going to skip over whitespace (and whitespace certainly isn't mandatory). Furthermore, the base of your program is an extremely symplistic regex - it just removes anything on a line starting at the first #. Your program could as well have been:
    perl -nle 's/#.*//; print if /\S/'

    But my biggest question is, why do you think this is useful for system administration? I don't know any system administrator who wants to remove comments from his configuration files or from his shell scripts.

    Abigail

      This is an annoying trend that's driving me nuts where I work to.. Somehow they are justifying it in the name of security. ( Even to the point of stripping comments from all applications.)

        I tend to ask people to elaborate on that, and ask them to explain how this is helping security. I also might point out that $ > /secret/file works even better (sure, it has some side-effects, but isn't security important enough that we can justify some side-effects?)

        Abigail

      Hi Abigail,

      Your program will strip she-bang lines unless such a line starts with whitespace.
      Are you sure about that? The shebang line is not stripped, if it is the first line, which gets preserved and re-inserted back into the final output..
      update- you're totally right about that, I screwed it up..

      why do you think this is useful for system administration..

      Because removing commented lines lets you get a quick view only of active lines - in a file that might have only a few active lines among several screens of commented lines, e.g. a stock squid.conf file..

      Thanks for the feedback!
        Because removing commented lines lets you get a quick view only of active lines - in a file that might have only a few active lines among several screens of commented lines, e.g. a stock squid.conf file..
        Well, a simple grep -v ^\# will do that. If an "active" line has a trailing comment, it doesn't matter. It also doesn't explain why you want to remove comments from a shell script.

        Abigail

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://366388]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2014-12-21 07:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (104 votes), past polls