|Pathologically Eclectic Rubbish Lister|
need regex help to strip things like embedded C commentsby Eradicatore (Monk)
|on Jul 21, 2007 at 22:22 UTC||Need Help??|
Eradicatore has asked for the
wisdom of the Perl Monks concerning the following question:
I have always wondered the answer to this question, but never really stuck with it to figure it out. I'm wondering if anyone here can help. Basically, let's say you have this c source code:
Now let's say you want to write a regex that strips out all c comments from the given c file. The way I was thinking about is would be to use non greedy regex to strip out only non-embeeded comments first. And then do multiple passes to ensure you got all comments out.
But then the trick is, I know how to use something like this to do character negation:
That will match stuff like "abcjjjkkklllmmmdef" and make it "abcdef" but it would NOT work if there was a x or y or z in that inside part like "abcjjjkkkxlllmmmdef"
Now back to the comments in C source code. I can't use single character negation. I want to NOT have a two character patern in the middle part.
I've looked at things like negative lookahead or negative lookbehind, but just don't think that works either.
Any regex experts out there that can answer this puzzle?
NOTE: it should also not assume there are no stars in the embedded comment. Or in other words, it should handle this too:
"If at all god's gaze upon us falls, its with a mischievous grin, look at him" -- Dave Matthews