in reply to Compressing/Obfuscating a Javascript file
Oops got totally carried away. This will crunch and obfuscate a javascript. It has
been tested on one 25k javascript and did not break it but.... that's not what you
call intensive QA. If anyone wants to test it and offer some feedback - ie broke script
or didn't break script that would be useful. This probably has some real utility
as it condensed a script I use a lot from 25k to 8k - that's a way faster download. I was
amazed that that comments/whitespace/function and var names make up fully 2/3 of the code.
Note that we list the change in function names so you can
modify the HTML as required. You could automate this pretty easily
with HTML::Parser.
Update 1
Modified original code to deal with quoted strings appropriately and added a hint of documentation
Update 2
Modified to put each function on its own line as intended.
#!/usr/bin/perl -w
use strict;
my ( $data, %funcs, %globals );
open JS, $ARGV[0] or die "Useage $0 <file>\nCan't open '$ARGV[0]': $!\
+n";
local $/;
$data = <JS>;
close JS;
my $was = length $data;
$data =~ s|//.*\n|\n|g; # strip single line comments
$data =~ s|/\*.*?\*/||gs; # strip multi-line comments
$data =~ s|^\s*\n||gm; # strip blank lines
$data =~ s|^\s+||gm; # strip leading whitespace
$data =~ s|\s+$||gm; # strip trailing whitespace
# do the magic and split into functions
my @functions = split /(?=\bfunction\s)/, $data;
# map function names to new names starting with 'a'
my $name = 'a';
for (@functions) {
$funcs{$1} = $name++ if m/^function\s+(\w+)/;
}
# modify all the function names
my $funcs = join '|', keys %funcs;
my $func_sub = qr/\b($funcs)\b/;
s|$func_sub|$funcs{$1}|g for @functions;
# modify all the global vars
# as we have split on function keyword these should
# be in the first element of our function array
unless ($functions[0] =~ /^function/) {
my @globals = $functions[0] =~ m/var\s+(\w+)/g;
if (@globals) {
$globals{$_} = $name++ for @globals;
my $globals = join '|', keys %globals;
my $global_sub = qr/\b($globals)\b/;
for my $func (@functions) {
my @chunks = chunk($func);
for (@chunks) {
next if m/^(?:"|')/; # leave quoted strings alone
s|$global_sub|$globals{$1}|g;
}
$func = join '', @chunks;
}
}
}
# modify all the scoped vars continuing var names on from func/global
+names
my $end_globals = $name;
for my $func (@functions) {
next unless $func =~ m/^function/;
my ( @locals, %locals );
$name = $end_globals; # each function can use the same local name
+s
@locals = $func =~ m/var\s+(\w+)/g;
my ($local) = $func =~ m/function\s+\w+\s*\(([^\)]+)/;
if ($local) {
$local =~ s/\n|\s//g;
push @locals, split ',', $local;
}
for my $var (@locals) {
next unless $var;
$locals{$var} = $name++;
}
next unless keys %locals;
my $locals = join '|', keys %locals;
my $local_sub = qr/\b($locals)\b/;
my @chunks = chunk($func);
for (@chunks) {
next if m/^(?:"|')/; # leave quoted strings alone
s|$local_sub|$locals{$1}|g;
}
$func = join '', @chunks;
}
# do some initial condensation around curlies
for (@functions) {
s/\n{/{/gm;
s/\n}/}/gm;
}
# now every exposed line ending should end in a ; { or } if we are to
+safely
# condense this down by removing newlines - we add the ; if are missin
+g
for my $func (@functions) {
my @lines = split "\n", $func;
for (@lines) {
$_.= ";" unless m/(?:}|{|;)$/;
}
$func = join '', @lines;
$func .= "\n";
}
# remove whitespace around all operators
my @operators = qw# + - * / = == != < > <= >= ( ) [ ] { } ? ; : #;
push @operators, ','; # need to do it this way to avoid warnings
$_ = quotemeta $_ for @operators;
my $operator_sub = join '|', @operators;
$operator_sub = qr/($operator_sub)/;
for my $func (@functions) {
my @chunks = chunk($func);
for (@chunks) {
next if m/^(?:"|')/; # leave quoted strings alone
s#[ \t]+$operator_sub#$1#g;
s#$operator_sub[ \t]+#$1#g;
}
$func = join '', @chunks;
}
# ta da time to print out the results
# first display a list of modified fuction names.
# Note: any function called in the html will have to be modified accor
+dingly!
print "New function names are:\nwas called\t=>\tis now called\n";
print "$_()\t=>\t$funcs{$_}()\n" for keys %funcs;
print "\n";
# print out the modified code
print @functions;
# a few stats just for the hell of it
print "\n\nLength change:\n";
my $is = length join '', @functions;
printf "Originally %d bytes now %d bytes or %2d%% of original size\n",
+ $was, $is, ($is/$was)*100;
exit;
# this sub splits a function into quoted and unquoted chunks
sub chunk {
my $func = shift;
my @chunks;
my $chunk = 0;
my $found_quote = '';
for (split //, $func) {
# look for opening quote
if (/'|"/ and ! $found_quote) {
$found_quote = $_;
$chunk++;
$chunks[$chunk] = $_;
next;
}
# look for coresponding closing quote
if ( $found_quote and /$found_quote/ ) {
$found_quote = '';
$chunks[$chunk] .= $_;
$chunk++;
next;
}
# no quotes so just add to current chunk
$chunks[$chunk] .= $_;
}
# strip whitespace from unquoted chunks;
for (@chunks) {
next if m/^(?:"|')/; # leave quoted strings alone
s/^[ \t]+|[ \t]+$//g;
}
return @chunks;
}
=head1 NAME
javastrip.pl - a Perl script to obfuscate and condense javascript code
Varsion 0.0000001
=head1 SYNOPSIS
javastrip.pl <file>
where file is the raw javascript only, not HTML
output is to STDOUT so send it wherever you want with a > redirect:
javastrip.pl infile.js > outfile.js
make a backup of original file first. keep backup. process is irreve
+rsible.
=head1 DESCRIPTION
This script is primarily designed to munge .js files. It will procees
+any
pure javascript but was not designed to process javascript embedded in
+
HTML.
It processes a javascript in several stages. The first stage is to rem
+ove
all comments, blank lines and leading/trailing whitespace. This is a f
+airly
safe thing to do and should not break scripts.
The next stage is rather more dangerous. All the fuctions are renamed.
+ The
first function found will be renamed a() the next b() and so on. All g
+lobal
vars are similarly renamed to single (if not all used up) letter names
+ that
follow in sequence from the function names. Finally all local function
+ vars
are renamed starting with the letter immediately after the last global
+.
The net result is that all the functions and variables will now have 1
+-2
letter meaningless names. There is plenty of scope for disaster here b
+ut I
do not have enough javascript on hand to detect any. The algorithm wor
+ks OK
on my style of javascripting.
The final stage is to condense the script down. Each function is writt
+en to
a single line. All excess whitespace around operators is stripped out.
+ This
is a fairly safe stage too. Literal newlines in strings will be stripp
+ed if
you are using them. "\n" is just fine.
If it breaks a script you can comment out different sections to see wh
+ich
process is to blame.
=head2 BUGS
Bound to be. This script was knocked up over a couple of hours and has
+ had
minimal testing. The algorithm works OK on my style of javascripting b
+ut my
javascript looks a lot like Perl. Email me any scripts that break when
+ you
javastrip them and I'll see if I can patch it.
=head1 AUTHOR
tachyon aka Dr James Freeman E<lt>jfreeman@tassie.net.auE<gt>
=head1 LICENSE
This package is free software; you can redistribute it and/or modify i
+t under
the terms of the "GNU General Public License".
=head1 DISCLAIMER
This script is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the "GNU General Public License" for more details.
=cut
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
Re: Re: Compressing/Obfuscating a Javascript file
by Incognito (Pilgrim) on Oct 10, 2001 at 05:34 UTC
|
The 'magical' function (a beautiful one-liner I never even though of) is great for most JavaScript code, but there is an exception I noticed going through my 400K of code...
If the Javascript contains an anonymous function, the regex will fail... Here's a function that contains multiple anonymous (non-named) functions:
function InitializeTree (objTree, strRootID, strRootLabel, strRootURL,
+ strRootImage) {
var objNewNode;
objTree.Target = null;
objTree.className = "TreeView";
objNewNode = AddTreeItem (objTree, "", strRootID, strRootLabel, st
+rRootURL, strRootImage, true, "");
// Set up the event handling.
with (objTree) {
onmouseup = function () { mouseupTreeItem(this); };
onmousedown = function () { mousedownTreeItem(this); };
onmouseover = function () { mouseoverTreeItem(this); };
onmouseout = function () { mouseoutTreeItem(this);};
onclick = function () { onclickTreeItem(this); };
ondblclick = function () { dblclickTreeItem(this); };
onresize = function () { onresizeTree(this); };
onselectstart = function () { window.event.returnValue=false;
+};
}
return (0);
}
For all other cases, I think it works fine... Again, I don't know where to start when a regex (or split, or whatever) that will handle this, but I feel we're almost there... | [reply] [d/l] |
|
# alternative one - insist on "function name (" syntax or no split
my @functions = split /(?=\bfunction\s+\w+\s*\()/, $data;
# alternative two - functions always on line by themselves
my @functions = split /^(?=function\s)/, $data;
Alternative 1 is the better option. cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
| [reply] [d/l] |
|
Yes, Alternative 1 is the better option, but the only problem I can see now is if we have named functions in the function itself... And (to be a devil's advocate) what if those functions have functions?
function InitializeTree (objBlah) {
// Set up the event handling.
with (objBlah) {
test = function test2() { doSomethingHere(); };
}
return (0);
}
| [reply] [d/l] |
|
|
Re: Re: Compressing/Obfuscating a Javascript file
by Incognito (Pilgrim) on Oct 10, 2001 at 05:49 UTC
|
Actually, I think I figured this one out!
Change the split regex to:
# do the magic and split into functions
my @functions = split /(?=\bfunction\s\w+)/, $data;
What we're doing here is forcing the function to be followed by a name... But what if the function contains a named function? As in:
function InitializeTree (objBlah) {
// Set up the event handling.
with (objBlah) {
test = function test2() { doSomethingHere(); };
}
return (0);
}
Then I believe that this may again fail...
We have to come up with a regex that ignores functions within functions... That's way beyond me... any ideas? | [reply] [d/l] [select] |
Re: Re: Compressing/Obfuscating a Javascript file
by Incognito (Pilgrim) on Oct 11, 2001 at 00:44 UTC
|
I've found a potential bug in the chunk() subroutine, when the document being parsed uses JavaScript regular expressions... Here's the test case:
b=b.replace(/"/gi, """);
As you can see, chunk() sees this first (") double-quote, which is actually part of a regex (and shouldn't be touched or considered a string) and then will move on to the next double-quote, which it considers to be the end of that string... when in fact it's just the beginning of """...
What I've done for myself is go through the document and substituted .match() and .replace calls first... and then work with quotes... but I'm sure there's a regex or function that can handle this scenario... | [reply] [d/l] |
|
If you want to see how to do it look at the Perl comment
stripping code. You have to perform some level of lexical analysis.
Once you get started on this the end point is a fairly horrid
hack or a full blown parser.
Here is some pseudo code (time for you to do the work!) It works but you
will need to integrate it. If you need to do it for match (I can't remember) then
change the string replace in the split re to (?:replace|match).
I told you this gets harder!
my @array = split /(?=\breplace\s*\()/, 'chunk1 replace(123\) ) chunk3
+ "Hello replace me" more stuff';
my @lotsa_chunks;
for my $bit (@array) {
if ($bit =~ /^replace/) {
# do careful stuff, on new chunk
my $last = '';
my $re = '';
for (split //, $bit) {
unless ( $_ eq ')' and $last ne "\\" ) {
$re .= $_;
$last = $_;
next;
}
$re .= $_; # add closing bracket
push @lotsa_chunks, $re; # push complete RE into
+a chunk
$bit =~ s/\Q$re\E//; # hack RE off
push @lotsa_chunks, chunk($bit); # chunk the remainder
}
} else {
push @lotsa_chunks, chunk($bit);
}
}
print "$_\n" for @lotsa_chunks;
# this sub splits a function into quoted and unquoted chunks
sub chunk {
my $func = shift;
my @chunks;
my $chunk = 0;
my $found_quote = '';
for (split //, $func) {
# look for RE
# look for opening quote
if (/'|"/ and ! $found_quote) {
$found_quote = $_;
$chunk++;
$chunks[$chunk] = $_;
next;
}
# look for coresponding closing quote
if ( $found_quote and /$found_quote/ ) {
$found_quote = '';
$chunks[$chunk] .= $_;
$chunk++;
next;
}
# no quotes so just add to current chunk
$chunks[$chunk] .= $_;
}
# strip whitespace from unquoted chunks;
for (@chunks) {
next if m/^(?:"|')/; # leave quoted strings alone
s/^[ \t]+|[ \t]+$//g;
}
return @chunks;
}
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
Update
What the hell, here it is integrated. I have changes the names of
chunk() to strings() The new RE parsing sub becomes chunk()
#!/usr/bin/perl -w
use strict;
my ( $data, %funcs, %globals );
open JS, $ARGV[0] or die "Useage $0 <file>\nCan't open '$ARGV[0]': $!\
+n";
local $/;
$data = <JS>;
close JS;
my $was = length $data;
$data =~ s|//.*\n|\n|g; # strip single line comments
$data =~ s|/\*.*?\*/||gs; # strip multi-line comments
$data =~ s|^\s*\n||gm; # strip blank lines
$data =~ s|^\s+||gm; # strip leading whitespace
$data =~ s|\s+$||gm; # strip trailing whitespace
# do the magic and split into functions
my @functions = split /(?=\bfunction\s+\w+\s*\()/, $data;
# map function names to new names starting with 'a'
my $name = 'a';
for (@functions) {
$funcs{$1} = $name++ if m/^function\s+(\w+)\s*\(/;
}
# modify all the function names
my $funcs = join '|', keys %funcs;
my $func_sub = qr/\b($funcs)\b/;
s|$func_sub|$funcs{$1}|g for @functions;
# modify all the global vars
# as we have split on function keyword these should
# be in the first element of our function array
unless ($functions[0] =~ /^function/) {
my @globals = $functions[0] =~ m/var\s+(\w+)/g;
if (@globals) {
$globals{$_} = $name++ for @globals;
my $globals = join '|', keys %globals;
my $global_sub = qr/\b($globals)\b/;
for my $func (@functions) {
my @chunks = chunk($func);
for (@chunks) {
next if m/^(?:"|')/; # leave quoted strings alone
s|$global_sub|$globals{$1}|g;
}
$func = join '', @chunks;
}
}
}
# modify all the scoped vars continuing var names on from func/global
+names
my $end_globals = $name;
for my $func (@functions) {
next unless $func =~ m/^function/;
my ( @locals, %locals );
$name = $end_globals; # each function can use the same local name
+s
@locals = $func =~ m/var\s+(\w+)/g;
my ($local) = $func =~ m/function\s+\w+\s*\(([^\)]+)/;
if ($local) {
$local =~ s/\n|\s//g;
push @locals, split ',', $local;
}
for my $var (@locals) {
next unless $var;
$locals{$var} = $name++;
}
next unless keys %locals;
my $locals = join '|', keys %locals;
my $local_sub = qr/\b($locals)\b/;
my @chunks = chunk($func);
for (@chunks) {
next if m/^(?:"|')/; # leave quoted strings alone
s|$local_sub|$locals{$1}|g;
}
$func = join '', @chunks;
}
# do some initial condensation around curlies
for (@functions) {
s/\n{/{/gm;
s/\n}/}/gm;
}
# now every exposed line ending should end in a ; { or } if we are to
+safely
# condense this down by removing newlines - we add the ; if are missin
+g
for my $func (@functions) {
my @lines = split "\n", $func;
for (@lines) {
$_.= ";" unless m/(?:}|{|;)$/;
}
$func = join '', @lines;
$func .= "\n";
}
# remove whitespace around all operators
my @operators = qw# + - * / = == != < > <= >= ( ) [ ] { } ? ; : #;
push @operators, ','; # need to do it this way to avoid warnings
$_ = quotemeta $_ for @operators;
my $operator_sub = join '|', @operators;
$operator_sub = qr/($operator_sub)/;
for my $func (@functions) {
my @chunks = chunk($func);
for (@chunks) {
next if m/^(?:"|')/; # leave quoted strings alone
s#[ \t]+$operator_sub#$1#g;
s#$operator_sub[ \t]+#$1#g;
}
$func = join '', @chunks;
}
# ta da time to print out the results
# first display a list of modified fuction names.
# Note: any function called in the html will have to be modified accor
+dingly!
print "New function names are:\nwas called\t=>\tis now called\n";
print "$_()\t=>\t$funcs{$_}()\n" for keys %funcs;
print "\n";
# print out the modified code
print @functions;
# a few stats just for the hell of it
print "\n\nLength change:\n";
my $is = length join '', @functions;
printf "Originally %d bytes now %d bytes or %2d%% of original size\n",
+ $was, $is, ($is/$was)*100;
exit;
# chop a function up into RE and non RE bits so we can chunkify
# it into strings and non string sections
sub chunk {
my $func = shift;
my @lotsa_chunks;
my @array = split /(?=\breplace\s*\()/, $func;
for my $bit (@array) {
if ($bit =~ /^replace/) {
# do careful quote parse, on RE chunk
my $last = '';
my $re = '';
for (split //, $bit) {
unless ( $_ eq ')' and $last ne "\\" ) {
$re .= $_;
$last = $_;
next;
}
$re .= $_; # add closing bracke
+t
push @lotsa_chunks, $re; # push complete RE i
+nto a chunk
$bit =~ s/\Q$re\E//; # hack RE off
push @lotsa_chunks, strings($bit); # chunk the remain
+der
}
} else {
push @lotsa_chunks, strings($bit);
}
}
return @lotsa_chunks
}
# this sub splits a function into quoted and unquoted chunks
sub strings {
my $func = shift;
my @chunks;
my $chunk = 0;
my $found_quote = '';
for (split //, $func) {
# look for RE
# look for opening quote
if (/'|"/ and ! $found_quote) {
$found_quote = $_;
$chunk++;
$chunks[$chunk] = $_;
next;
}
# look for coresponding closing quote
if ( $found_quote and /$found_quote/ ) {
$found_quote = '';
$chunks[$chunk] .= $_;
$chunk++;
next;
}
# no quotes so just add to current chunk
$chunks[$chunk] .= $_;
}
# strip whitespace from unquoted chunks;
for (@chunks) {
next if m/^(?:"|')/; # leave quoted strings alone
s/^[ \t]+|[ \t]+$//g;
}
return @chunks;
}
=head1 NAME
javastrip.pl - a Perl script to obfuscate and condense javascript code
=head1 SYNOPSIS
javastrip.pl <file>
where file is the raw javascript only, not HTML
output is to STDOUT so send it wherever you want with a > redirect:
javastrip.pl infile.js > outfile.js
=head1 DESCRIPTION
This script is primarily designed to munge .js files. It will procees
+any
pure javascript but was not designed to process javascript embedded in
+
HTML.
It processes a javascript in several stages. The first stage is to rem
+ove
all comments, blank lines and leading/trailing whitespace. This is a f
+airly
safe thing to do and should not break scripts.
The next stage is rather more dangerous. All the fuctions are renamed.
+ The
first function found will be renamed a() the next b() and so on. All g
+lobal
vars are similarly renamed to single (if not all used up) letter names
+ that
follow in sequence from the function names. Finally all local function
+ vars
are renamed starting with the letter immediately after the last global
+.
The net result is that all the functions and variables will now have 1
+-2
letter meaningless names. There is plenty of scope for disaster here b
+ut I
do not have enough javascript on hand to detect any. The algorithm wor
+ks OK
on my style of javascripting.
The final stage is to condense the script down. Each function is writt
+en to
a single line. All excess whitespace around operators is stripped out.
+ This
is a fairly safe stage too. Literal newlines in strings will be stripp
+ed if
you are using them. "\n" is just fine.
If it breaks a script you can comment out different sections to see wh
+ich
process is to blame.
=head2 BUGS
Bound to be. This script was knocked up over a couple of hours and has
+ had
minimal testing. The algorithm works OK on my style of javascripting b
+ut my
javascript looks a lot like Perl. Email me any scripts that break when
+ you
javastrip them and I'll see if I can patch it.
=head1 AUTHOR
tachyon aka Dr James Freeman E<lt>jfreeman@tassie.net.auE<gt>
=head1 LICENSE
This package is free software; you can redistribute it and/or modify i
+t under
the terms of the "GNU General Public License".
=head1 DISCLAIMER
This script is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the "GNU General Public License" for more details.
=cut
| [reply] [d/l] [select] |
|
Wholly crap! You are good... :)
I'm going to study this stuff to see if I can pick up some better skills and help others as well... If it's worth anything, I've found another scenario for the chunk() subroutine that causes it to break... If strings have escaped characters, then things go all whacked out:
function test () {
var a = "James (aka Tachyon) is a \"Perl Saint\" !!!";
var b = "aaaa'bbbb" + 'cccc"dddd' + "eeee\"ffff" + "gggg\'hhhh";
}
I tried modifying chunk() to that it remembers the previous character (and if it were an escape character) so I could treat the current character as just a plain character (and not the end quote)...
sub chunk {
my ($strOutput) = @_;
my (@chunks);
my ($chunk) = 0;
my ($found_quote) = '';
my ($preceded_by_escape) = 0;
for (split //, $strOutput) {
# look for opening quote
if ( /'|"/ and ! $found_quote and ! $preceded_by_escape) {
$found_quote = $_;
$chunk++;
$chunks[$chunk] = $_;
$preceded_by_escape = (/\\/) ? 1 : 0;
next;
}
# look for corresponding closing quote
if ( $found_quote and /$found_quote/ and ! $preceded_by_escape
+) {
$found_quote = '';
$chunks[$chunk] .= $_;
$chunk++;
$preceded_by_escape = (/\\/) ? 1 : 0;
next;
}
# no quotes so just add to current chunk
$chunks[$chunk] .= $_;
$preceded_by_escape = (/\\/) ? 1 : 0;
}
# strip whitespace from unquoted chunks;
for (@chunks) {
next if m/^(?:"|')/; # leave quoted strings alone
s/^[ \t]+|[ \t]+$//g;
}
return @chunks;
}
This seems to work okay... I thought it didn't, but I believe it was bad data... let me know if this helps... (I'd like to do something productive for you today)! :) | [reply] [d/l] [select] |
|
|
|