Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re: Extracting C-Style Comments (Revisited)

by chipmunk (Parson)
on Feb 18, 2002 at 21:54 UTC ( #146264=note: print w/replies, xml ) Need Help??

in reply to Extracting C-Style Comments (Revisited)

There are two things that are tripping you up. The first is the greediness of [^"'/]+. The part of the regex that matches a regular expression looks for an equal sign or left paren followed by a slash; unfortunately, the equal sign or left paren has already been gobbled up! You could fix this by adding = and ( to the character class. On the other hand, since you're substituting in place, you don't even need that part of the regex. Just remove [^"'/]+ | and it should work fine.

The other problem is this curious regex in the JS: mystring.match(/[/\\*?"<>\:~|]/gi);. That regex would not be valid in Perl, because it contains an unescaped forward slash. Is it really valid in JavaScript? If so, you'll need to extend your regex so that it allows unescaped slashes within square brackets.

Replies are listed 'Best First'.
Re: Re: Extracting C-Style Comments (Revisited)
by Incognito (Pilgrim) on Feb 18, 2002 at 23:58 UTC
    Yes, the greediness of [^"'/]+ was definitely the problem... The new regular expression to strip of C-Style comments from a JavaScript file is:
    $strOutput =~ s{ # First, we'll list things we want # to match, but not throw away ( # Match a regular expression (they start with ( or =). # Then the have a slash, and end with a slash. # The first slash must not be followed by * and cannot contain # newline chars. eg: var "re = /\*/;" or "a = b.match (/x/);" (?: [\(=] \s* / (?: # char class contents \[ \^? ]? (?: [^]\\]+ | \\. )* ] | # escaped and regular chars (\/ and \.) (?: [^[\\\/]+ | \\. )* )* /[gi]* ) | # or double quoted string (?: "[^"\\]* (?:\\.[^"\\]*)*" [^"'/]* )+ | # or single quoted constant (?: '[^'\\]* (?:\\.[^'\\]*)*' [^"'/]* )+ ) | # or we'll match a comment. Since it's not in the # $1 parentheses above, the comments will disappear # when we use $1 as the replacement text. / # (all comments start with a slash) (?: # traditional C comments (?: \* [^*]* \*+ (?: [^/*] [^*]* \*+ )* / ) | # or C++ //-style comments (?: / [^\n]* ) ) }{$1}gsx;
    I'll do some further testing, but it looks like this huge regex will do the trick! Thanks and ++ to you chipmunk.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://146264]
[stevieb]: I advised the OP that a possible workaround would be to add a version function/flag that displays the required copyright/license info instead
[Marshall]: Darn! I'll look at the thread. Must not be updatinga byte count somewhere. The .exe format is a complex critter.
[stevieb]: that way, it's still "hard coded" into the exe at least, despite not being visible via Properties
[Marshall]: I also looked into PerlApp from Active State, but they don't sell their Dev Kit independent of a very expensive ($1,200) per year license anymore.
[Corion]: I would assume that the PE format hasn't changed that much since the days of yore, but it seems that I would be wrong in that assumption
[Marshall]: When I bought my copy 15+ years ago, it was just a couple hundred bucks.
[Marshall]: I looked at MS specs for PE format and I didn't see any changes in last decade, but evidenly that would be wrong.

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (11)
As of 2016-12-08 18:18 GMT
Find Nodes?
    Voting Booth?
    On a regular basis, I'm most likely to spy upon:

    Results (144 votes). Check out past polls.