Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Extracting C-Style Comments (Revisited Again)

by chipmunk (Parson)
on Mar 06, 2002 at 02:42 UTC ( #149594=note: print w/ replies, xml ) Need Help??


in reply to Extracting C-Style Comments (Revisited Again)

It took longer than I'd like to admit to figure out the problem this time. :)

| # or double quoted string (?: "[^"\\]* (?:\\.[^"\\]*)*" [^"'/]* )
This matches a double-quoted string, then some amount of code after the double-quoted string. [^"'/]* will match everything up to and including the open parenthesis or equal sign that you are relying on to match as the beginning of the JS regular expression. Simply remove that bit from your regex (after the single-quoted string match as well) and the JS code snippet will be parsed properly.


Comment on Re: Extracting C-Style Comments (Revisited Again)
Select or Download Code
Re: Re: Extracting C-Style Comments (Revisited Again)
by Incognito (Pilgrim) on Mar 06, 2002 at 03:56 UTC

    Excellent! Another ++ to you!!! I actually understood your answer for once, which is great... short and to the point... Here's the fully updated regex code for those that are interested...

    #--------------------------------------------------------------------- +- # Here is the fundamental code to match JavaScript code. # This includes regular expressions and quoted strings. #--------------------------------------------------------------------- +- my ($regexJSCode) = qr{ # First, we'll list things we want # to match, but not throw away (?: # Match a regular expression (they start with ( or =). # Then the have a slash, and end with a slash. # The first slash must not be followed by * and cannot contain # newline chars. eg: var "re = /\*/;" or "a = b.match (/x/);" [\(=] \s* / (?: # char class contents \[ \^? ]? (?: [^]\\]+ | \\. )* ] | # escaped and regular chars (\/ and \.) (?: [^[\\\/]+ | \\. )* )* / (?: [gi]* # next characters are not word characters (?= [^\w] ) ) ) | # or double quoted string (?: "[^"\\]* (?:\\.[^"\\]*)*" )+ | # or single quoted constant (?: '[^'\\]* (?:\\.[^'\\]*)*' )+ }x; #--------------------------------------------------------------------- +- # Here is the fundamental code to match JavaScript comments and commen +t blocks. #--------------------------------------------------------------------- +- my ($regexJSComments) = qr{ # or we'll match a comment. Since it's not in the # $1 parentheses above, the comments will disappear # when we use $1 as the replacement text. / # (all comments start with a slash) (?: # traditional C comments (?: \* [^*]* \*+ (?: [^/*] [^*]* \*+ )* / ) | # or C++ //-style comments (?: / [^\n]* ) ) }x; #--------------------------------------------------------------------- +- # Get rid of all comments from the string. #--------------------------------------------------------------------- +- $strOutput =~ s{ ( $regexJSCode ) | $regexJSComments }{$1}gsx;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://149594]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2014-07-30 01:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls