Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Removing javascript comments

by tachyon-II (Chaplain)
on Apr 10, 2008 at 03:53 UTC ( #679390=note: print w/replies, xml ) Need Help??


in reply to Removing javascript comments

As noted to do this properly you really need an HTML parser to extract the javascript and then a javascript parser to parse the javascript. A regex solution will never be 100% reliable BUT that said it is a great way to learn about regexes as you get to write a regex to do a task and then find out it does unexpected stuff!

perlre is the reference to regexes. Although s/this/that/ is the usual format ie using / to delimit the regex you can in fact use just about anything. When you have / symbols to match you either use a different delimiter of you have to escape the / symbols in the regex with a \ so \/ means match /. Compare the readability of:

$str =~ s/http:\/\///g; # and $str =~ s|http://||g; $str =~ s#http://##g; $str =~ s!http://!!g;

The first example is a bit harder to read due to the escapes. At least it is to me. Unfortunately the * char is a regex wildcard so you will still need to escape that. To get you started try:

$str =~ s!(/\*[^/]+\*/)!>>>> \1 <<<<!g; # /* comments */ $str =~ s!(\s//[^/\n]+)!>>>> \1 <<<<!g; # // comments print $str;

There will be plenty of edge cases not handled by this code but you can have a great learning experience trying to fix them without breaking other things! The () in the match section just allow you to explicity see what has happened as we capture the match and do some funky ascii highlighting in the replace section. To strip you just replace with nothing and don't need to capture but first you need to see what is happening. For debugging you may like to use:

$str =~ s!(fancy re here)! print "$1\n"; "" !ge;

That way when you run the code on a file you get the comments stripped but also printed to STDOUT so you can watch and make sure all the stuff that is going looks like comments!

Welcome to Perl, hope you have fun and it helps you to get the job done.

Replies are listed 'Best First'.
Re^2: Removing javascript comments
by Anonymous Monk on Apr 10, 2008 at 05:02 UTC
    Thank you all very much for your valuable feedback... I almost have my code finished with the exception of the removal of javascript comments, so I will try out all your suggestions and see what happens... the best way to learn of course is to dabble in different methodologies to try and find the optimum solution for the given problem... looking forward to getting in to some cool Perl projects in the future... I'll post again once I have the solution in place... thanks again!
      Hi, I have a similar task and was wondering if you were able to accomplish this?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://679390]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2022-01-19 21:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (56 votes). Check out past polls.

    Notices?