Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re: Extracting C-Style Comments (Revisited)

by chipmunk (Parson)
on Feb 18, 2002 at 21:54 UTC ( #146264=note: print w/ replies, xml ) Need Help??

in reply to Extracting C-Style Comments (Revisited)

There are two things that are tripping you up. The first is the greediness of [^"'/]+. The part of the regex that matches a regular expression looks for an equal sign or left paren followed by a slash; unfortunately, the equal sign or left paren has already been gobbled up! You could fix this by adding = and ( to the character class. On the other hand, since you're substituting in place, you don't even need that part of the regex. Just remove [^"'/]+ | and it should work fine.

The other problem is this curious regex in the JS: mystring.match(/[/\\*?"<>\:~|]/gi);. That regex would not be valid in Perl, because it contains an unescaped forward slash. Is it really valid in JavaScript? If so, you'll need to extend your regex so that it allows unescaped slashes within square brackets.

Comment on Re: Extracting C-Style Comments (Revisited)
Select or Download Code
Re: Re: Extracting C-Style Comments (Revisited)
by Incognito (Pilgrim) on Feb 18, 2002 at 23:58 UTC
    Yes, the greediness of [^"'/]+ was definitely the problem... The new regular expression to strip of C-Style comments from a JavaScript file is:
    $strOutput =~ s{ # First, we'll list things we want # to match, but not throw away ( # Match a regular expression (they start with ( or =). # Then the have a slash, and end with a slash. # The first slash must not be followed by * and cannot contain # newline chars. eg: var "re = /\*/;" or "a = b.match (/x/);" (?: [\(=] \s* / (?: # char class contents \[ \^? ]? (?: [^]\\]+ | \\. )* ] | # escaped and regular chars (\/ and \.) (?: [^[\\\/]+ | \\. )* )* /[gi]* ) | # or double quoted string (?: "[^"\\]* (?:\\.[^"\\]*)*" [^"'/]* )+ | # or single quoted constant (?: '[^'\\]* (?:\\.[^'\\]*)*' [^"'/]* )+ ) | # or we'll match a comment. Since it's not in the # $1 parentheses above, the comments will disappear # when we use $1 as the replacement text. / # (all comments start with a slash) (?: # traditional C comments (?: \* [^*]* \*+ (?: [^/*] [^*]* \*+ )* / ) | # or C++ //-style comments (?: / [^\n]* ) ) }{$1}gsx;
    I'll do some further testing, but it looks like this huge regex will do the trick! Thanks and ++ to you chipmunk.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://146264]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2014-09-20 03:20 GMT
Find Nodes?
    Voting Booth?

    How do you remember the number of days in each month?

    Results (152 votes), past polls