Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

removing C style comments from text files

by ctp (Beadle)
on Jan 06, 2004 at 08:13 UTC ( [id://319041]=perlquestion: print w/replies, xml ) Need Help??

ctp has asked for the wisdom of the Perl Monks concerning the following question:

I need to remove C style comments from a text file. For my first try I wrote:

$string =~ s/\/\*(.*)\*\///g;

It did what I wanted in some cases, but it didn't remove comments spanning newlines. So I did some research and found this regex posted somewhere else here in the monastary, claiming to do just what I need:


so I tried it in my substitution regex:

$string =~ s/*([^*]|\*+[^/*])*\*+//g;

Which of course didn't work...but, hey, I'm green and trusting. So I tweaked the regex many different ways, and I got it to not work in a whole new range of ways...none really as satisfying as actually having it work.

Suggestions? Thanks in advance!

Replies are listed 'Best First'.
Re: removing C style comments from text files
by Abigail-II (Bishop) on Jan 06, 2004 at 09:01 UTC
    use Regexp::Common; $string =~ s/$RE{comment}{C}//g;

    Note that this (just like the regexes presented in the rest of this thread) isn't context aware, and happily removes "comments" from strings.


      I address this problem by replacing quoted strings with unique tags prior to removing the comments. After stripping the comments I then restore the original strings for those tags remaining (some may get stripped).

        That's kind of a chicken-and-egg problem, isn't? How can you succesfully remove strings, if you can't detect comments? Consider:
        /* One " two */ a = b + 4; /* three " four */
        You have to do it all in one pass. Something like:
        s { ( [^"'/]* # Not a string, character o +r comment. | "[^\\"]*(?:\\.[^\\"]*)*" # String. | '[^\\']*(?:\\.[^\\']*)*' # Char. | / (?![*]) # Slash, not a comment. ) | ( /[*] [^*]* (?: [*] [^*/]* )* [*]/ ) # Comment. } { $2 ? "" : $1 }gsex;
        But that isn't fool proof either (consider # define).


Re: removing C style comments from text files
by bsb (Priest) on Jan 06, 2004 at 08:19 UTC
Re: removing C style comments from text files
by ysth (Canon) on Jan 06, 2004 at 08:17 UTC
    If you want it to work across newlines, you need to do a couple things. First, make sure all the lines are in a single string. Second, if you are using . and want it to match any character including a newline, use the m//s flag. Without /s, it will match any character except a newline.

    The other issue is keeping what is supposed to match the inside of the comment from matching the end, some more code, and the beginning and inside of another comment. The simple m/\/\*.*\*\//s regex will match all of "/* comment 1 */ some = code; /* comment 2 */". You tell * to match as little as possible instead of as much as possible by adding a ?, so it becomes m/\/\*.*?\*\//s.

      so within ysth restrictions, if one wants a one-liner perl -0777 -pe 's{/\*.*?\*/}{}gs' source.c my .02
        I just tried it and it worked! I've never seen a regex contained in braces before...and what's the deal with the empty braces at the end?

      I will work with the m//s flag stuff. My original regex had the *?, but in my sample text file it didn't seem to make a difference, i.e. it wasn't acting greedy either way. thanks!
Re: removing C style comments from text files
by CountZero (Bishop) on Jan 06, 2004 at 09:07 UTC
    Why would one want to remove comments from a source file? The compiler doesn't mind.


    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      It's an assignment for an intro Perl course.

      UPDATE - hey hey hey now. I've always tried to be up front about my needs, and I always write as much code as I can before I ask for help, and I always do my best to learn what I can from all suggested code before I just go and use it.


        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: removing C style comments from text files
by BUU (Prior) on Jan 06, 2004 at 08:42 UTC
    Moving beyond the whole perl/regex thing, what about just running the c pre processor on it?

    The only downside there I see is possible text matching #defines that you don't want to get replaced, but I would think matching what cpp considers a #define would be vastly simpler then matching comments..
      No, you don't want the preprocessor output. Consider the following classical program:
      # include <stdlib.h> # include <stdio.h> int main (int argc, char * argv []) { printf ("Hello, world\n"); /* Print 'Hello, world' */ exit (0); }

      Assume this is in the file hello.c.

      $ gcc -E hello.c | wc -l 1566 $ gcc -E hello.c | grep -v '^$' | wc -l 853
      Even with blank lines removed, the 7 line hello.c expands to 853 lines of pre processor output.


        Well yes, but it expands because you #include files. From his description (text files) I assumed that they weren't actual C programs and wouldn't be using the rest of the pre-processor commands. So there wouldn't be any #includes to expand the file size so dramatically. Beyond that I also suggested that it might be easier to match anything the pre-processor would consider a #define/#include, since the rules for that are fairly strict as I recall.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://319041]
Approved by Corion
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-06-20 18:41 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.