Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

If you want to see how to do it look at the Perl comment stripping code. You have to perform some level of lexical analysis.

Once you get started on this the end point is a fairly horrid hack or a full blown parser.

Here is some pseudo code (time for you to do the work!) It works but you will need to integrate it. If you need to do it for match (I can't remember) then change the string replace in the split re to (?:replace|match).

I told you this gets harder!

my @array = split /(?=\breplace\s*\()/, 'chunk1 replace(123\) ) chunk3 + "Hello replace me" more stuff'; my @lotsa_chunks; for my $bit (@array) { if ($bit =~ /^replace/) { # do careful stuff, on new chunk my $last = ''; my $re = ''; for (split //, $bit) { unless ( $_ eq ')' and $last ne "\\" ) { $re .= $_; $last = $_; next; } $re .= $_; # add closing bracket push @lotsa_chunks, $re; # push complete RE into +a chunk $bit =~ s/\Q$re\E//; # hack RE off push @lotsa_chunks, chunk($bit); # chunk the remainder } } else { push @lotsa_chunks, chunk($bit); } } print "$_\n" for @lotsa_chunks; # this sub splits a function into quoted and unquoted chunks sub chunk { my $func = shift; my @chunks; my $chunk = 0; my $found_quote = ''; for (split //, $func) { # look for RE # look for opening quote if (/'|"/ and ! $found_quote) { $found_quote = $_; $chunk++; $chunks[$chunk] = $_; next; } # look for coresponding closing quote if ( $found_quote and /$found_quote/ ) { $found_quote = ''; $chunks[$chunk] .= $_; $chunk++; next; } # no quotes so just add to current chunk $chunks[$chunk] .= $_; } # strip whitespace from unquoted chunks; for (@chunks) { next if m/^(?:"|')/; # leave quoted strings alone s/^[ \t]+|[ \t]+$//g; } return @chunks; }

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Update

What the hell, here it is integrated. I have changes the names of

chunk() to strings() The new RE parsing sub becomes chunk()

#!/usr/bin/perl -w use strict; my ( $data, %funcs, %globals ); open JS, $ARGV[0] or die "Useage $0 <file>\nCan't open '$ARGV[0]': $!\ +n"; local $/; $data = <JS>; close JS; my $was = length $data; $data =~ s|//.*\n|\n|g; # strip single line comments $data =~ s|/\*.*?\*/||gs; # strip multi-line comments $data =~ s|^\s*\n||gm; # strip blank lines $data =~ s|^\s+||gm; # strip leading whitespace $data =~ s|\s+$||gm; # strip trailing whitespace # do the magic and split into functions my @functions = split /(?=\bfunction\s+\w+\s*\()/, $data; # map function names to new names starting with 'a' my $name = 'a'; for (@functions) { $funcs{$1} = $name++ if m/^function\s+(\w+)\s*\(/; } # modify all the function names my $funcs = join '|', keys %funcs; my $func_sub = qr/\b($funcs)\b/; s|$func_sub|$funcs{$1}|g for @functions; # modify all the global vars # as we have split on function keyword these should # be in the first element of our function array unless ($functions[0] =~ /^function/) { my @globals = $functions[0] =~ m/var\s+(\w+)/g; if (@globals) { $globals{$_} = $name++ for @globals; my $globals = join '|', keys %globals; my $global_sub = qr/\b($globals)\b/; for my $func (@functions) { my @chunks = chunk($func); for (@chunks) { next if m/^(?:"|')/; # leave quoted strings alone s|$global_sub|$globals{$1}|g; } $func = join '', @chunks; } } } # modify all the scoped vars continuing var names on from func/global +names my $end_globals = $name; for my $func (@functions) { next unless $func =~ m/^function/; my ( @locals, %locals ); $name = $end_globals; # each function can use the same local name +s @locals = $func =~ m/var\s+(\w+)/g; my ($local) = $func =~ m/function\s+\w+\s*\(([^\)]+)/; if ($local) { $local =~ s/\n|\s//g; push @locals, split ',', $local; } for my $var (@locals) { next unless $var; $locals{$var} = $name++; } next unless keys %locals; my $locals = join '|', keys %locals; my $local_sub = qr/\b($locals)\b/; my @chunks = chunk($func); for (@chunks) { next if m/^(?:"|')/; # leave quoted strings alone s|$local_sub|$locals{$1}|g; } $func = join '', @chunks; } # do some initial condensation around curlies for (@functions) { s/\n{/{/gm; s/\n}/}/gm; } # now every exposed line ending should end in a ; { or } if we are to +safely # condense this down by removing newlines - we add the ; if are missin +g for my $func (@functions) { my @lines = split "\n", $func; for (@lines) { $_.= ";" unless m/(?:}|{|;)$/; } $func = join '', @lines; $func .= "\n"; } # remove whitespace around all operators my @operators = qw# + - * / = == != < > <= >= ( ) [ ] { } ? ; : #; push @operators, ','; # need to do it this way to avoid warnings $_ = quotemeta $_ for @operators; my $operator_sub = join '|', @operators; $operator_sub = qr/($operator_sub)/; for my $func (@functions) { my @chunks = chunk($func); for (@chunks) { next if m/^(?:"|')/; # leave quoted strings alone s#[ \t]+$operator_sub#$1#g; s#$operator_sub[ \t]+#$1#g; } $func = join '', @chunks; } # ta da time to print out the results # first display a list of modified fuction names. # Note: any function called in the html will have to be modified accor +dingly! print "New function names are:\nwas called\t=>\tis now called\n"; print "$_()\t=>\t$funcs{$_}()\n" for keys %funcs; print "\n"; # print out the modified code print @functions; # a few stats just for the hell of it print "\n\nLength change:\n"; my $is = length join '', @functions; printf "Originally %d bytes now %d bytes or %2d%% of original size\n", + $was, $is, ($is/$was)*100; exit; # chop a function up into RE and non RE bits so we can chunkify # it into strings and non string sections sub chunk { my $func = shift; my @lotsa_chunks; my @array = split /(?=\breplace\s*\()/, $func; for my $bit (@array) { if ($bit =~ /^replace/) { # do careful quote parse, on RE chunk my $last = ''; my $re = ''; for (split //, $bit) { unless ( $_ eq ')' and $last ne "\\" ) { $re .= $_; $last = $_; next; } $re .= $_; # add closing bracke +t push @lotsa_chunks, $re; # push complete RE i +nto a chunk $bit =~ s/\Q$re\E//; # hack RE off push @lotsa_chunks, strings($bit); # chunk the remain +der } } else { push @lotsa_chunks, strings($bit); } } return @lotsa_chunks } # this sub splits a function into quoted and unquoted chunks sub strings { my $func = shift; my @chunks; my $chunk = 0; my $found_quote = ''; for (split //, $func) { # look for RE # look for opening quote if (/'|"/ and ! $found_quote) { $found_quote = $_; $chunk++; $chunks[$chunk] = $_; next; } # look for coresponding closing quote if ( $found_quote and /$found_quote/ ) { $found_quote = ''; $chunks[$chunk] .= $_; $chunk++; next; } # no quotes so just add to current chunk $chunks[$chunk] .= $_; } # strip whitespace from unquoted chunks; for (@chunks) { next if m/^(?:"|')/; # leave quoted strings alone s/^[ \t]+|[ \t]+$//g; } return @chunks; } =head1 NAME javastrip.pl - a Perl script to obfuscate and condense javascript code =head1 SYNOPSIS javastrip.pl <file> where file is the raw javascript only, not HTML output is to STDOUT so send it wherever you want with a > redirect: javastrip.pl infile.js > outfile.js =head1 DESCRIPTION This script is primarily designed to munge .js files. It will procees +any pure javascript but was not designed to process javascript embedded in + HTML. It processes a javascript in several stages. The first stage is to rem +ove all comments, blank lines and leading/trailing whitespace. This is a f +airly safe thing to do and should not break scripts. The next stage is rather more dangerous. All the fuctions are renamed. + The first function found will be renamed a() the next b() and so on. All g +lobal vars are similarly renamed to single (if not all used up) letter names + that follow in sequence from the function names. Finally all local function + vars are renamed starting with the letter immediately after the last global +. The net result is that all the functions and variables will now have 1 +-2 letter meaningless names. There is plenty of scope for disaster here b +ut I do not have enough javascript on hand to detect any. The algorithm wor +ks OK on my style of javascripting. The final stage is to condense the script down. Each function is writt +en to a single line. All excess whitespace around operators is stripped out. + This is a fairly safe stage too. Literal newlines in strings will be stripp +ed if you are using them. "\n" is just fine. If it breaks a script you can comment out different sections to see wh +ich process is to blame. =head2 BUGS Bound to be. This script was knocked up over a couple of hours and has + had minimal testing. The algorithm works OK on my style of javascripting b +ut my javascript looks a lot like Perl. Email me any scripts that break when + you javastrip them and I'll see if I can patch it. =head1 AUTHOR tachyon aka Dr James Freeman E<lt>jfreeman@tassie.net.auE<gt> =head1 LICENSE This package is free software; you can redistribute it and/or modify i +t under the terms of the "GNU General Public License". =head1 DISCLAIMER This script is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the "GNU General Public License" for more details. =cut

In reply to Re: Re: Re: Compressing/Obfuscating a Javascript file by tachyon
in thread Compressing/Obfuscating a Javascript file by Incognito

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chilling in the Monastery: (9)
    As of 2020-04-04 09:52 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      The most amusing oxymoron is:
















      Results (32 votes). Check out past polls.

      Notices?