Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Extracting C Style Comments Revised (JavaScript)

by Tetramin (Sexton)
on Oct 24, 2001 at 00:08 UTC ( [id://120893]=note: print w/replies, xml ) Need Help??


in reply to Extracting C Style Comments Revised (JavaScript)

There is the section that says
# First, we'll list things we want # to match, but not throw away
Just do that and add
(?:/[^\r\n\*\/]+/) # Match RegExp |
after the round bracket ("("). This could work for the above problem. But you cannot make it perfect without further code parsing, e.g. these will still go wrong
.replace(/\//, "") abc/100 // comment

Replies are listed 'Best First'.
Re: Re: Extracting C Style Comments Revised (JavaScript)
by Incognito (Pilgrim) on Oct 24, 2001 at 00:36 UTC

    Incorporating what you've added as input:

    New Code

    $data =~ s{ # First, we'll list things we want # to match, but not throw away ( (?:/[^\r\n\*\/]+/) # Match RegExp | # -or- [^"'/]+ # other stuff | # -or- (?:"[^"\\]*(?:\\.[^"\\]*)*" [^"'/]*)+ # double quoted string | # -or- (?:'[^'\\]*(?:\\.[^'\\]*)*' [^"'/]*)+ # single quoted constant ) | # or we'll match a comment. Since it's not in the # $1 parentheses above, the comments will disappear # when we use $1 as the replacement text. / # (all comments start with a slash) (?: \*[^*]*\*+(?:[^/*][^*]*\*+)*/ # traditional C comments | # -or- /[^\n]* # C++ //-style comments ) }{$1}gsx;

    Updated

    This code does work for the above examples, but does not work for regular expressions with containing a '*', for example.
    var b=/\s*;\s*/gi;
    There should be a way for us to do this, because we want to handle that 99% of code that is out there... without writing a parser...

    I'm thinking we need to modify the regex in the "# Match RegExp" section further, to ignore *s and \/s... this may not be easy, and if I figure it out, I'll post it here.

      Try (?:/[^\r\n\*\/][^\r\n\/]*/)

      Still doesn't work with divisions like abc/100 because it now thinks it's the beginning of a regular expression.

        Possible Hack

        I think one way to do this may be to make the assumption that all JavaScript regular expressions follow after an equal "=" sign or a left-parenthesis "(".

        $data =~ s{ # First, we'll list things we want # to match, but not throw away ( (?: # Match RegExp [\(=]\s* # start with ( or = / [^\r\n\*\/][^\r\n\/]* / # All RegExps start and end # with slash, but first one # must not be followed by * # and cannot contain newline # chars # # var re = /\*/; # a = b.match (/x/); ) | # -or- [^"'/]+ # other stuff | # -or- (?:"[^"\\]*(?:\\.[^"\\]*)*" [^"'/]*)+ # double quoted string | # -or- (?:'[^'\\]*(?:\\.[^'\\]*)*' [^"'/]*)+ # single quoted constant ) | # or we'll match a comment. Since it's not in the # $1 parentheses above, the comments will disappear # when we use $1 as the replacement text. / # (all comments start with a slash) (?: \*[^*]*\*+(?:[^/*][^*]*\*+)*/ # traditional C comments | # -or- /[^\n]* # C++ //-style comments ) }{$1}gsx;

        Does anyone know how to improve on this or how to make it fail?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://120893]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (3)
As of 2024-04-24 19:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found