Incognito has asked for the wisdom of the Perl Monks concerning the following question:


I've written a perl script to go through a directory containing JavaScript library files (plain text files containing pure JavaScript code, mainly functions and some global variables at the top) and remove comments and unnecessary whitespace. This works using some pretty complex (to me) regular expressions, which first remove comments, then go through the document and hide all strings, compress the whitespace, replace the strings, etc...

The next (and final) step to this process is the obfuscation... I want to go through each function in the document and, for each parameter in the function header, replace the parameter with a single letter (or whatever) to make the function smaller (and I suppose unreadable - but that's secondary)...

I want to do this, because there's over 400K of JS library files which when transmitted over a 56K modem (what's a modem you say?), can be a very large download on the initial hit to the web page. By obfuscating each function's parameter list, significant file size reduction can be done...

I'm really looking for a regular expression that can go through the file (stored as one big string) and returns each function in an array... or I just pass the name of the function that I want and the regex returns the entire function as a string to me....

A typical Javascript function looks like this:

function myFunctionName (strName, intValue, strOtherString, blnResult, strAnotherString, strEtc) {
var intMyLocalVariable = 0;

var strAnotherVariable = "blah";

if (blnResult) {
strAnotherVariable = "yakk";
} else {
strAnotherString = "yikes";

print strName; return (intValue);

Anyway, the problem is the fact that I can't just go and look for the first non-escaped "}" character, since there could be if statements and other "{}" characters that are completely valid in a subroutine/function. What is this mystical regex that I'm looking for? It would take me days and days to figure one out, when I know someone out there is definitely smarter/more experienced than I at this sort of stuff...

Here's a regex that I wrote to grab just the function headers in a file...

sub GetJSFunctionHeaders { 
my ($strOutput) = @_;
my (@subroutines) = ();

while ($strOutput =~ m/(function\s*\S+\s*\(\s*(?:^\\\)|\\.)*\s*\))\s*/ig) {
push (@subroutines, $1);

return (\@subroutines);

This works great.... but I want all the data IN the function, as that's what I'm going to go through and obfuscate the function parameters.... Does anyone out there know the magical REGEX? If so, that would be sooooooo cool. I would worship you every night for the next week if you so desire :)