Compressing/Obfuscating a Javascript file

Incognito has asked for the wisdom of the Perl Monks concerning the following question:

I've written a perl script to go through a directory containing JavaScript library files (plain text files containing pure JavaScript code, mainly functions and some global variables at the top) and remove comments and unnecessary whitespace. This works using some pretty complex (to me) regular expressions, which first remove comments, then go through the document and hide all strings, compress the whitespace, replace the strings, etc...

The next (and final) step to this process is the obfuscation... I want to go through each function in the document and, for each parameter in the function header, replace the parameter with a single letter (or whatever) to make the function smaller (and I suppose unreadable - but that's secondary)...

I want to do this, because there's over 400K of JS library files which when transmitted over a 56K modem (what's a modem you say?), can be a very large download on the initial hit to the web page. By obfuscating each function's parameter list, significant file size reduction can be done...

I'm really looking for a regular expression that can go through the file (stored as one big string) and returns each function in an array... or I just pass the name of the function that I want and the regex returns the entire function as a string to me....

A typical Javascript function looks like this:

function myFunctionName (strName, intValue, strOtherString, blnResult, 
strAnotherString, strEtc) { 
  var intMyLocalVariable = 0; 
  var strAnotherVariable = "blah";



if (blnResult) { 
strAnotherVariable = "yakk"; 
} else { 
strAnotherString = "yikes";
}
 
  print strName; 
  return (intValue); 
}

Anyway, the problem is the fact that I can't just go and look for the first non-escaped "}" character, since there could be if statements and other "{}" characters that are completely valid in a subroutine/function. What is this mystical regex that I'm looking for? It would take me days and days to figure one out, when I know someone out there is definitely smarter/more experienced than I at this sort of stuff...

Here's a regex that I wrote to grab just the function headers in a file...

sub GetJSFunctionHeaders { 
my ($strOutput) = @_; 
  my (@subroutines) = ();
 
while ($strOutput =~ 
m/(function\s*\S+\s*\(\s*(?:^\\\)|\\.)*\s*\))\s*/ig) { 
    push (@subroutines, $1); 
  }
  
  
  return (\@subroutines); 
}

This works great.... but I want all the data IN the function, as that's what I'm going to go through and obfuscate the function parameters.... Does anyone out there know the magical REGEX? If so, that would be sooooooo cool. I would worship you every night for the next week if you so desire :)

</BODY></HTML>

Comment on Compressing/Obfuscating a Javascript file

Replies are listed 'Best First'.
Re: Compressing/Obfuscating a Javascript file by andreychek (Parson) on Oct 06, 2001 at 07:44 UTC
There is actually already a module which does something like what you're looking for, called HTML::Clean. There is one big difference between it, and the solution you are working on. Your solution modifies the files directly while still on your disk. It may make them hard to read when you need to maintain them, but no extra processing is required when a browser requests them.. so it may be a bit faster then the method the module uses. HTML::Clean will leave the code in your files in tact, and modifies the JavaScript code on the fly as the browser requests it. There may be a slight performance hit for this, but I don't really think your users will find it noticable.. but you should probably test that, and not take my word for it :-) I personally prefer using this module, as it makes code maintenance much easier. However, the best method definitely depends on the requirements of your project. If obfuscation of the source files on your disk is what you're looking for, this might not do the trick. This only "obfuscates" the code as users view it in their browsers. But if you use a regex to obfuscate your code, someone can just as easily use a regex to unobfuscate it :-) Good luck! -Eric	[reply]
Re: Compressing/Obfuscating a Javascript file by tachyon (Chancellor) on Oct 06, 2001 at 10:57 UTC
Oops got totally carried away. This will crunch and obfuscate a javascript. It has been tested on one 25k javascript and did not break it but.... that's not what you call intensive QA. If anyone wants to test it and offer some feedback - ie broke script or didn't break script that would be useful. This probably has some real utility as it condensed a script I use a lot from 25k to 8k - that's a way faster download. I was amazed that that comments/whitespace/function and var names make up fully 2/3 of the code. Note that we list the change in function names so you can modify the HTML as required. You could automate this pretty easily with HTML::Parser. Update 1 Modified original code to deal with quoted strings appropriately and added a hint of documentation Update 2 Modified to put each function on its own line as intended. #!/usr/bin/perl -w use strict; my ( $data, %funcs, %globals ); open JS, $ARGV[0] or die "Useage $0 <file>\nCan't open '$ARGV[0]': $!\ +n"; local $/; $data = <JS>; close JS; my $was = length $data; $data =~ s\|//.\n\|\n\|g; # strip single line comments $data =~ s\|/\.?\/\|\|gs; # strip multi-line comments $data =~ s\|^\s\n\|\|gm; # strip blank lines $data =~ s\|^\s+\|\|gm; # strip leading whitespace $data =~ s\|\s+$\|\|gm; # strip trailing whitespace # do the magic and split into functions my @functions = split /(?=\bfunction\s)/, $data; # map function names to new names starting with 'a' my $name = 'a'; for (@functions) { $funcs{$1} = $name++ if m/^function\s+(\w+)/; } # modify all the function names my $funcs = join '\|', keys %funcs; my $func_sub = qr/\b($funcs)\b/; s\|$func_sub\|$funcs{$1}\|g for @functions; # modify all the global vars # as we have split on function keyword these should # be in the first element of our function array unless ($functions[0] =~ /^function/) { my @globals = $functions[0] =~ m/var\s+(\w+)/g; if (@globals) { $globals{$_} = $name++ for @globals; my $globals = join '\|', keys %globals; my $global_sub = qr/\b($globals)\b/; for my $func (@functions) { my @chunks = chunk($func); for (@chunks) { next if m/^(?:"\|')/; # leave quoted strings alone s\|$global_sub\|$globals{$1}\|g; } $func = join '', @chunks; } } } # modify all the scoped vars continuing var names on from func/global +names my $end_globals = $name; for my $func (@functions) { next unless $func =~ m/^function/; my ( @locals, %locals ); $name = $end_globals; # each function can use the same local name +s @locals = $func =~ m/var\s+(\w+)/g; my ($local) = $func =~ m/function\s+\w+\s$([^$]+)/; if ($local) { $local =~ s/\n\|\s//g; push @locals, split ',', $local; } for my $var (@locals) { next unless $var; $locals{$var} = $name++; } next unless keys %locals; my $locals = join '\|', keys %locals; my $local_sub = qr/\b($locals)\b/; my @chunks = chunk($func); for (@chunks) { next if m/^(?:"\|')/; # leave quoted strings alone s\|$local_sub\|$locals{$1}\|g; } $func = join '', @chunks; } # do some initial condensation around curlies for (@functions) { s/\n{/{/gm; s/\n}/}/gm; } # now every exposed line ending should end in a ; { or } if we are to +safely # condense this down by removing newlines - we add the ; if are missin +g for my $func (@functions) { my @lines = split "\n", $func; for (@lines) { $_.= ";" unless m/(?:}\|{\|;)$/; } $func = join '', @lines; $func .= "\n"; } # remove whitespace around all operators my @operators = qw# + - * / = == != < > <= >= ( ) [ ] { } ? ; : #; push @operators, ','; # need to do it this way to avoid warnings $_ = quotemeta $_ for @operators; my $operator_sub = join '\|', @operators; $operator_sub = qr/($operator_sub)/; for my $func (@functions) { my @chunks = chunk($func); for (@chunks) { next if m/^(?:"\|')/; # leave quoted strings alone s#[ \t]+$operator_sub#$1#g; s#$operator_sub[ \t]+#$1#g; } $func = join '', @chunks; } # ta da time to print out the results # first display a list of modified fuction names. # Note: any function called in the html will have to be modified accor +dingly! print "New function names are:\nwas called\t=>\tis now called\n"; print "$_()\t=>\t$funcs{$_}()\n" for keys %funcs; print "\n"; # print out the modified code print @functions; # a few stats just for the hell of it print "\n\nLength change:\n"; my $is = length join '', @functions; printf "Originally %d bytes now %d bytes or %2d%% of original size\n", + $was, $is, ($is/$was)*100; exit; # this sub splits a function into quoted and unquoted chunks sub chunk { my $func = shift; my @chunks; my $chunk = 0; my $found_quote = ''; for (split //, $func) { # look for opening quote if (/'\|"/ and ! $found_quote) { $found_quote = $_; $chunk++; $chunks[$chunk] = $_; next; } # look for coresponding closing quote if ( $found_quote and /$found_quote/ ) { $found_quote = ''; $chunks[$chunk] .= $_; $chunk++; next; } # no quotes so just add to current chunk $chunks[$chunk] .= $_; } # strip whitespace from unquoted chunks; for (@chunks) { next if m/^(?:"\|')/; # leave quoted strings alone s/^[ \t]+\|[ \t]+$//g; } return @chunks; } =head1 NAME javastrip.pl - a Perl script to obfuscate and condense javascript code Varsion 0.0000001 =head1 SYNOPSIS javastrip.pl <file> where file is the raw javascript only, not HTML output is to STDOUT so send it wherever you want with a > redirect: javastrip.pl infile.js > outfile.js make a backup of original file first. keep backup. process is irreve +rsible. =head1 DESCRIPTION This script is primarily designed to munge .js files. It will procees +any pure javascript but was not designed to process javascript embedded in + HTML. It processes a javascript in several stages. The first stage is to rem +ove all comments, blank lines and leading/trailing whitespace. This is a f +airly safe thing to do and should not break scripts. The next stage is rather more dangerous. All the fuctions are renamed. + The first function found will be renamed a() the next b() and so on. All g +lobal vars are similarly renamed to single (if not all used up) letter names + that follow in sequence from the function names. Finally all local function + vars are renamed starting with the letter immediately after the last global +. The net result is that all the functions and variables will now have 1 +-2 letter meaningless names. There is plenty of scope for disaster here b +ut I do not have enough javascript on hand to detect any. The algorithm wor +ks OK on my style of javascripting. The final stage is to condense the script down. Each function is writt +en to a single line. All excess whitespace around operators is stripped out. + This is a fairly safe stage too. Literal newlines in strings will be stripp +ed if you are using them. "\n" is just fine. If it breaks a script you can comment out different sections to see wh +ich process is to blame. =head2 BUGS Bound to be. This script was knocked up over a couple of hours and has + had minimal testing. The algorithm works OK on my style of javascripting b +ut my javascript looks a lot like Perl. Email me any scripts that break when + you javastrip them and I'll see if I can patch it. =head1 AUTHOR tachyon aka Dr James Freeman E<lt>jfreeman@tassie.net.auE<gt> =head1 LICENSE This package is free software; you can redistribute it and/or modify i +t under the terms of the "GNU General Public License". =head1 DISCLAIMER This script is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the "GNU General Public License" for more details. =cut [download] cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Re: Compressing/Obfuscating a Javascript file by Incognito (Pilgrim) on Oct 10, 2001 at 05:34 UTC
The 'magical' function (a beautiful one-liner I never even though of) is great for most JavaScript code, but there is an exception I noticed going through my 400K of code... If the Javascript contains an anonymous function, the regex will fail... Here's a function that contains multiple anonymous (non-named) functions: function InitializeTree (objTree, strRootID, strRootLabel, strRootURL, + strRootImage) { var objNewNode; objTree.Target = null; objTree.className = "TreeView"; objNewNode = AddTreeItem (objTree, "", strRootID, strRootLabel, st +rRootURL, strRootImage, true, ""); // Set up the event handling. with (objTree) { onmouseup = function () { mouseupTreeItem(this); }; onmousedown = function () { mousedownTreeItem(this); }; onmouseover = function () { mouseoverTreeItem(this); }; onmouseout = function () { mouseoutTreeItem(this);}; onclick = function () { onclickTreeItem(this); }; ondblclick = function () { dblclickTreeItem(this); }; onresize = function () { onresizeTree(this); }; onselectstart = function () { window.event.returnValue=false; +}; } return (0); } [download] For all other cases, I think it works fine... Again, I don't know where to start when a regex (or split, or whatever) that will handle this, but I feel we're almost there...	[reply] [d/l]
Re: Re: Re: Compressing/Obfuscating a Javascript file by tachyon (Chancellor) on Oct 10, 2001 at 06:21 UTC
I presume the problem is that it is splitting on the anon functions which you do not want. Either of these two should work better: `# alternative one - insist on "function name (" syntax or no split my @functions = split /(?=\bfunction\s+\w+\s*\()/, $data; # alternative two - functions always on line by themselves my @functions = split /^(?=function\s)/, $data;` [download] Alternative 1 is the better option. cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Re: Re: Re: Compressing/Obfuscating a Javascript file by Incognito (Pilgrim) on Oct 10, 2001 at 06:34 UTC
Re: Re: Re: Re: Re: Compressing/Obfuscating a Javascript file by tachyon (Chancellor) on Oct 10, 2001 at 07:55 UTC
Some notes below your chosen depth have not been shown here
Re: Re: Compressing/Obfuscating a Javascript file by Incognito (Pilgrim) on Oct 10, 2001 at 05:49 UTC
Actually, I think I figured this one out! Change the split regex to: `# do the magic and split into functions my @functions = split /(?=\bfunction\s\w+)/, $data;` [download] What we're doing here is forcing the function to be followed by a name... But what if the function contains a named function? As in: `function InitializeTree (objBlah) { // Set up the event handling. with (objBlah) { test = function test2() { doSomethingHere(); }; } return (0); }` [download] Then I believe that this may again fail... We have to come up with a regex that ignores functions within functions... That's way beyond me... any ideas?	[reply] [d/l] [select]
Re: Re: Compressing/Obfuscating a Javascript file by Incognito (Pilgrim) on Oct 11, 2001 at 00:44 UTC
I've found a potential bug in the chunk() subroutine, when the document being parsed uses JavaScript regular expressions... Here's the test case: `b=b.replace(/"/gi, """);` [download] As you can see, chunk() sees this first (") double-quote, which is actually part of a regex (and shouldn't be touched or considered a string) and then will move on to the next double-quote, which it considers to be the end of that string... when in fact it's just the beginning of """... What I've done for myself is go through the document and substituted .match() and .replace calls first... and then work with quotes... but I'm sure there's a regex or function that can handle this scenario...	[reply] [d/l]
Re: Re: Re: Compressing/Obfuscating a Javascript file by tachyon (Chancellor) on Oct 11, 2001 at 01:48 UTC
If you want to see how to do it look at the Perl comment stripping code. You have to perform some level of lexical analysis. Once you get started on this the end point is a fairly horrid hack or a full blown parser. Here is some pseudo code (time for you to do the work!) It works but you will need to integrate it. If you need to do it for match (I can't remember) then change the string replace in the split re to (?:replace\|match). I told you this gets harder! my @array = split /(?=\breplace\s$)/, 'chunk1 replace(123$ ) chunk3 + "Hello replace me" more stuff'; my @lotsa_chunks; for my $bit (@array) { if ($bit =~ /^replace/) { # do careful stuff, on new chunk my $last = ''; my $re = ''; for (split //, $bit) { unless ( $_ eq ')' and $last ne "\\" ) { $re .= $_; $last = $_; next; } $re .= $_; # add closing bracket push @lotsa_chunks, $re; # push complete RE into +a chunk $bit =~ s/\Q$re\E//; # hack RE off push @lotsa_chunks, chunk($bit); # chunk the remainder } } else { push @lotsa_chunks, chunk($bit); } } print "$_\n" for @lotsa_chunks; # this sub splits a function into quoted and unquoted chunks sub chunk { my $func = shift; my @chunks; my $chunk = 0; my $found_quote = ''; for (split //, $func) { # look for RE # look for opening quote if (/'\|"/ and ! $found_quote) { $found_quote = $_; $chunk++; $chunks[$chunk] = $_; next; } # look for coresponding closing quote if ( $found_quote and /$found_quote/ ) { $found_quote = ''; $chunks[$chunk] .= $_; $chunk++; next; } # no quotes so just add to current chunk $chunks[$chunk] .= $_; } # strip whitespace from unquoted chunks; for (@chunks) { next if m/^(?:"\|')/; # leave quoted strings alone s/^[ \t]+\|[ \t]+$//g; } return @chunks; } [download] tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print Update What the hell, here it is integrated. I have changes the names of chunk() to strings() The new RE parsing sub becomes chunk() #!/usr/bin/perl -w use strict; my ( $data, %funcs, %globals ); open JS, $ARGV[0] or die "Useage $0 <file>\nCan't open '$ARGV[0]': $!\ +n"; local $/; $data = <JS>; close JS; my $was = length $data; $data =~ s\|//.\n\|\n\|g; # strip single line comments $data =~ s\|/\.?\/\|\|gs; # strip multi-line comments $data =~ s\|^\s\n\|\|gm; # strip blank lines $data =~ s\|^\s+\|\|gm; # strip leading whitespace $data =~ s\|\s+$\|\|gm; # strip trailing whitespace # do the magic and split into functions my @functions = split /(?=\bfunction\s+\w+\s\()/, $data; # map function names to new names starting with 'a' my $name = 'a'; for (@functions) { $funcs{$1} = $name++ if m/^function\s+(\w+)\s$/; } # modify all the function names my $funcs = join '\|', keys %funcs; my $func_sub = qr/\b($funcs)\b/; s\|$func_sub\|$funcs{$1}\|g for @functions; # modify all the global vars # as we have split on function keyword these should # be in the first element of our function array unless ($functions[0] =~ /^function/) { my @globals = $functions[0] =~ m/var\s+(\w+)/g; if (@globals) { $globals{$_} = $name++ for @globals; my $globals = join '\|', keys %globals; my $global_sub = qr/\b($globals)\b/; for my $func (@functions) { my @chunks = chunk($func); for (@chunks) { next if m/^(?:"\|')/; # leave quoted strings alone s\|$global_sub\|$globals{$1}\|g; } $func = join '', @chunks; } } } # modify all the scoped vars continuing var names on from func/global +names my $end_globals = $name; for my $func (@functions) { next unless $func =~ m/^function/; my ( @locals, %locals ); $name = $end_globals; # each function can use the same local name +s @locals = $func =~ m/var\s+(\w+)/g; my ($local) = $func =~ m/function\s+\w+\s\(([^$]+)/; if ($local) { $local =~ s/\n\|\s//g; push @locals, split ',', $local; } for my $var (@locals) { next unless $var; $locals{$var} = $name++; } next unless keys %locals; my $locals = join '\|', keys %locals; my $local_sub = qr/\b($locals)\b/; my @chunks = chunk($func); for (@chunks) { next if m/^(?:"\|')/; # leave quoted strings alone s\|$local_sub\|$locals{$1}\|g; } $func = join '', @chunks; } # do some initial condensation around curlies for (@functions) { s/\n{/{/gm; s/\n}/}/gm; } # now every exposed line ending should end in a ; { or } if we are to +safely # condense this down by removing newlines - we add the ; if are missin +g for my $func (@functions) { my @lines = split "\n", $func; for (@lines) { $_.= ";" unless m/(?:}\|{\|;)$/; } $func = join '', @lines; $func .= "\n"; } # remove whitespace around all operators my @operators = qw# + - / = == != < > <= >= ( ) [ ] { } ? ; : #; push @operators, ','; # need to do it this way to avoid warnings $_ = quotemeta $_ for @operators; my $operator_sub = join '\|', @operators; $operator_sub = qr/($operator_sub)/; for my $func (@functions) { my @chunks = chunk($func); for (@chunks) { next if m/^(?:"\|')/; # leave quoted strings alone s#[ \t]+$operator_sub#$1#g; s#$operator_sub[ \t]+#$1#g; } $func = join '', @chunks; } # ta da time to print out the results # first display a list of modified fuction names. # Note: any function called in the html will have to be modified accor +dingly! print "New function names are:\nwas called\t=>\tis now called\n"; print "$_()\t=>\t$funcs{$_}()\n" for keys %funcs; print "\n"; # print out the modified code print @functions; # a few stats just for the hell of it print "\n\nLength change:\n"; my $is = length join '', @functions; printf "Originally %d bytes now %d bytes or %2d%% of original size\n", + $was, $is, ($is/$was)100; exit; # chop a function up into RE and non RE bits so we can chunkify # it into strings and non string sections sub chunk { my $func = shift; my @lotsa_chunks; my @array = split /(?=\breplace\s\()/, $func; for my $bit (@array) { if ($bit =~ /^replace/) { # do careful quote parse, on RE chunk my $last = ''; my $re = ''; for (split //, $bit) { unless ( $_ eq ')' and $last ne "\\" ) { $re .= $_; $last = $_; next; } $re .= $_; # add closing bracke +t push @lotsa_chunks, $re; # push complete RE i +nto a chunk $bit =~ s/\Q$re\E//; # hack RE off push @lotsa_chunks, strings($bit); # chunk the remain +der } } else { push @lotsa_chunks, strings($bit); } } return @lotsa_chunks } # this sub splits a function into quoted and unquoted chunks sub strings { my $func = shift; my @chunks; my $chunk = 0; my $found_quote = ''; for (split //, $func) { # look for RE # look for opening quote if (/'\|"/ and ! $found_quote) { $found_quote = $_; $chunk++; $chunks[$chunk] = $_; next; } # look for coresponding closing quote if ( $found_quote and /$found_quote/ ) { $found_quote = ''; $chunks[$chunk] .= $_; $chunk++; next; } # no quotes so just add to current chunk $chunks[$chunk] .= $_; } # strip whitespace from unquoted chunks; for (@chunks) { next if m/^(?:"\|')/; # leave quoted strings alone s/^[ \t]+\|[ \t]+$//g; } return @chunks; } =head1 NAME javastrip.pl - a Perl script to obfuscate and condense javascript code =head1 SYNOPSIS javastrip.pl <file> where file is the raw javascript only, not HTML output is to STDOUT so send it wherever you want with a > redirect: javastrip.pl infile.js > outfile.js =head1 DESCRIPTION This script is primarily designed to munge .js files. It will procees +any pure javascript but was not designed to process javascript embedded in + HTML. It processes a javascript in several stages. The first stage is to rem +ove all comments, blank lines and leading/trailing whitespace. This is a f +airly safe thing to do and should not break scripts. The next stage is rather more dangerous. All the fuctions are renamed. + The first function found will be renamed a() the next b() and so on. All g +lobal vars are similarly renamed to single (if not all used up) letter names + that follow in sequence from the function names. Finally all local function + vars are renamed starting with the letter immediately after the last global +. The net result is that all the functions and variables will now have 1 +-2 letter meaningless names. There is plenty of scope for disaster here b +ut I do not have enough javascript on hand to detect any. The algorithm wor +ks OK on my style of javascripting. The final stage is to condense the script down. Each function is writt +en to a single line. All excess whitespace around operators is stripped out. + This is a fairly safe stage too. Literal newlines in strings will be stripp +ed if you are using them. "\n" is just fine. If it breaks a script you can comment out different sections to see wh +ich process is to blame. =head2 BUGS Bound to be. This script was knocked up over a couple of hours and has + had minimal testing. The algorithm works OK on my style of javascripting b +ut my javascript looks a lot like Perl. Email me any scripts that break when + you javastrip them and I'll see if I can patch it. =head1 AUTHOR tachyon aka Dr James Freeman E<lt>jfreeman@tassie.net.auE<gt> =head1 LICENSE This package is free software; you can redistribute it and/or modify i +t under the terms of the "GNU General Public License". =head1 DISCLAIMER This script is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the "GNU General Public License" for more details. =cut [download]	[reply] [d/l] [select]
Re: Re: Compressing/Obfuscating a Javascript file by Incognito (Pilgrim) on Oct 11, 2001 at 03:32 UTC
Re: Re: Re: Compressing/Obfuscating a Javascript file by tachyon (Chancellor) on Oct 11, 2001 at 05:40 UTC
Re: Compressing/Obfuscating a Javascript file by tachyon (Chancellor) on Oct 06, 2001 at 07:54 UTC
To do this properly is difficult. As a start you need to do somthing like this: local $/; # undef $/ so we glob data in $data = <DATA>; # get all our data as a string $data =~ s\|//.\n\|\n\|g; # strip single line comments $data =~ s\|/\.?\/\|\|gs; # strip multi-line comments $data =~ s\|^\s\n\|\|gm; # strip blank lines $data =~ s\|^\s+\|\|gm; # strip leading whitespace $data =~ s\|\s+$\|\|gm; # strip trailing whitespace # do the magic and split into functions @functions = split /(?=\bfunction\s)/, $data; # pudding proof print "$_ ---\n" for @functions; __DATA__ var answer = 42; // just because function 1 { blah // comment; } / this function is no good function 2 { blah } / function 3 { if (functionvar){ // var name not function keyword // do the blah stuff } } / this is another comment */ function 4 { just another perl hacker } [download] You need to remove all the comments first to ensure that the only occurences of the word function represent keywords. You can then just use split with a positive lookahead assertion in the regex to generate your array. The next step you want to do is much harder but that is another story. cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re (tilly) 1: Compressing/Obfuscating a Javascript file by tilly (Archbishop) on Oct 06, 2001 at 19:47 UTC
I haven't tried it out myself, but I suspect that something like mod_gzip is the best solution if you are using Apache. If you are not, then you could write a decompression algorithm in JavaScript and then convert your JavaScript into a file that looks like this: `eval(uncompress('#$!@compressed gobbledygook here@#$@#'))` [download]	[reply] [d/l]
Re: Re (tilly) 1: Compressing/Obfuscating a Javascript file by andreychek (Parson) on Oct 07, 2001 at 06:44 UTC
If we used Perl to do the compression, I think that would work quite well. However, I discovered a very odd bug in Netscape 4.x and compression. For anyone not familar with this, most recent browsers are capable of receiving gzip compressed data, and they can uncompress it on the fly. The mod_gzip tilly mentioned is Apache's way of compressing data before the code crosses the wire. However, mod_gzip only compresses data if the browser passes certain headers to the webserver saying the browser is indeed capable of handling compression. Netscape 4.x seems to be able to handle compressed HTML just fine. However, I've run into a problem where if you do something like this in your HTML code: `<script src="/lib/mylib.js">` [download] The browser still sends the "compression okay" headers when requesting mylib.js, but it actually does not uncompress it properly once it has received the document. The actual error I received was a JavaScript error saying "invalid" this or that, but what it referred to was a block of binary looking data, which seems to be the uncompressed data received from the webserver. And mind you, it took me some time to figure out what in the world was going on, as I forgot I had gzip compression turned on ;-) I discovered this using Apache 1.3.19, mod_gzip, and Netscape 4.77. -Eric	[reply] [d/l]
Re: Re: Re (tilly) 1: Compressing/Obfuscating a Javascript file by mischief (Hermit) on Oct 07, 2001 at 23:27 UTC
Perhaps you could get around this by using SSIs to include the javascript libs in the data sent to the browser rather than `"<script src=...">`? That would also speed up loading (although only marginally, but you did say modem, didn't you? :-) times as it would reduce the number of requests the browser has to send, so less headers send & received and less delays connecting.	[reply] [d/l]
Re: Re: Re: Re (tilly) 1: Compressing/Obfuscating a Javascript file by andreychek (Parson) on Oct 08, 2001 at 01:03 UTC
Re: Compressing/Obfuscating Javascript file by VSarkiss (Monsignor) on Oct 06, 2001 at 05:11 UTC
I really doubt you could find a single regex to do what you want, simply because JS is too complicated a language to parse with a single regex. You could probably do it if you imposed severe restrictions on your coders (no comments on the first line, never put opening curly brace on a line by itself, that kind of thing), in which case you'd also have to do that to every script you get from somewhere else. My advice would be to construct a parser. You can find a bnf-like grammar for JS 1.4 at mozilla.org. I couldn't spot a JS 2.0 grammar, though that doesn't mean it's not there. ;-) There are also tools to construct parsers in Perl on CPAN, like Parse::YAPP and Parse::RecDescent. It shouldn't be too hard to glue the two together -- probably easier than trying to maintain strict control over 400K of code. HTH	[reply]
Re: Compressing/Obfuscating a Javascript file by Anonymous Monk on Oct 16, 2004 at 18:50 UTC
i am	[reply]

Back to Seekers of Perl Wisdom

Update 1

Update 2

Update