Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
more useful options
 
PerlMonks  

Deobfuscation for fun and profit

by jmcnamara (Monsignor)
on May 27, 2002 at 12:14 UTC ( #169553=perlmeditation: print w/ replies, xml ) Need Help??


The Obfuscated Code section contains some interesting code. The problem is that without working your way through the code it is difficult to decide if it is any good or not. This is probably why obfuscated code which is formatted into a recognisable shape often get more votes than visually mundane but superior code.*

However, it is unlikely that anyone has the time to work through all of the code posted in Obfuscated Code in order to determine the quality. Therefore, it would be nice if people took it upon themselves to take an occasional submission and deobfuscate it.

Apart from the benefits to the community at large deobfuscation is a good way to learn some of the more idiomatic or obscure features of Perl.

Perltidy or the indent-region function in emacs can be helpful in unearthing hidden code structures. And since perl is blind to obfuscation B::Deparse can be useful for clarifying code segments.

However, if the obfuscation is good these tools will be of limited use. I once used Perltidy and emacs to massage Dominus's obfu into a readable shape. It didn't help. While the overall structure was visible the mechanism was still deeply hidden.

As an introduction to the subject of obfuscation and JAPHs have a look at Teodor Zlatanov's The elegance of JAPH or Abigail's JAPHs and other obscure signatures.

So learn a little bit more about Perl while gaining XP and the respect of your peers. Deobfuscate today.

--
John.

* This is not to say that some of the superior code isn't formatted as well.

Comment on Deobfuscation for fun and profit
Re: Deobfuscation for fun and profit
by vladb (Vicar) on May 27, 2002 at 13:54 UTC
    You are right, coming up with spoilers for most obfuscations is a tough job. Unfortunately, even with ~ 500 XP left to my sainthood, I still am not able to de-obfuscate much of the code submitted to the Obfuscated Code section. I guess this requires special skill and much practice, not to mention countless hours of meditation.

    I once read an excellent post by japhy titled Japhy's Obfuscation Review. I wish we had more similar posts by serious obfuscation freaks (aka gurus ;)! Although, I learnt quite a few neat tricks from japhy, I'm still far far away from being able to tackle or come up with my own obfuscations that would be hard to break. In fact, the furthest I came to writing an obfuscation is the code you see in my signature. And, of course, it still pales in comparison to some of the obfu's you've mentioned ;/. I also didn't get much ++ votes when I submitted this code to the Obfuscated Code section, titled System admin the obfuscated way. (*hint* *hint* hope you drop a few extra ++ in the bucket ;-))

    What I think would be useful, in addition to your idea, is to have more authors of original obfuscations to submit a link somewhere in their post to a spoiler page. Then, if anyone is interested in getting to the bottom of a complex obfu, he/she may simply hit the link and follow to the spoiler page.

    UPDATE: added last 'suggestion' paragraph ;)

    _____________________
    $"=q;grep;;$,=q"grep";for(`find . -name ".saves*~"`){s;$/;;;/(.*-(\d+) +-.*)$/; $_=["ps -e -o pid | "," $2 | "," -v "," "];`@$_`?{print"+ $1"}:{print" +- $1"}&&`rm $1`; print$\;}
Obfu spoiler: vladb's sig (was Re: Deobfuscation for fun and profit)
by belden (Friar) on May 27, 2002 at 23:49 UTC
    0. Motivation

    Three reasons to deobfuscate vladb's signature: fun and profit, as pointed out by jmcnamara; vladb monkself said What I think would be useful, in addition to your idea, is to have more authors of original obfuscations to submit a link somewhere in their post to a spoiler page but the only spoiler I could find relating to the sig was in the original post; finally, vladb said that he is still ...not able to de-obfuscate much of the code... me neither, and I need to start somewhere! :)

    ====

    1. The original

    $"=q;grep;;$,=q"grep";for(`find . -name ".saves*~"`){s;$/;;;/(.*-(\d+) +-.*)$/; $_=["ps -e -o pid | "," $2 | "," -v "," "];`@$_`?{print"+ $1"}:{print" +- $1"}&&`r m $1`; print$\;}
    ====

    2. Apply a bit of formatting. (Or, in restrospect, why I'm a lamo.)

    I'm unsure whether that grep;;$,=q"grep"; should really be one line or two... I'll assume one for now.

    $"=q; grep;;$,=q"grep"; for(`find . -name ".saves*~"`){ s;$/;;; /(.*-(\d+)-.*)$/; $_=["ps -e -o pid | "," $2 | "," -v "," "]; `@$_` ? {print"+ $1"} : {print"- $1"} && `rm $1`; print $\; }
    ====

    3. Add in some line numbering.

    1: $"=q; 2: 3: grep;;$,=q"grep"; 4: 5: for(`find . -name ".saves*~"`){ 6: s;$/;;; 7: /(.*-(\d+)-.*)$/; 8: $_=["ps -e -o pid | "," $2 | "," -v "," "]; 9: `@$_` ? {print"+ $1"} : {print"- $1"} && `rm $1`; 10: print $\; 11: }
    ====

    At first glance, line 1 seems to set $", the quoted-array-seperator, to the letter 'q'. The tinkering with $" makes me think that vladb is going to use an array somewhere later on; and thinking that $"=q; meant $"='q';, one might start thinking about the upcoming array...

    I'm kind of slow with coding, so just ran a quick oneliner to see whether the first line really does what I thought:

    perl -e '@a = qw (a bc def); print "before: @a\n"; $"=q; print "after: + @a\n";' Output: before: a bc def
    This is surprising; obviously two things are happening here:
    1. $"=q; is actually using the q operator to quote, umm, something...
    2. For some reason, when $" is set to that, err, something, the second print statement fails.

    Line 1 confused me! B::Deparse gets a lot of mention in the monastery; maybe it can help me here.

    perl -MO=Deparse -e '@a = qw (a bc def); print "before: @a\n"; $"=q; p +rint "after: @a\n";' Output: @a = ('a', 'bc', 'def'); # ok, I agree print "before: @a\n"; # yep, still with ya $" = ' print "after: @a\\n"'; # whoa, this is unexpected -e syntax OK # good news, I guess
    ====

    Aha, after playing around above, reading perlop, and looking back at my first step, I see where vladb led me astray: I split up his code wrong!

     $"=q;grep;;$,=q"grep";
    should actually be broken up like this:

    $"=q;grep;; $,=q"grep";
    which is the equivalent of this:
    $" = 'grep'; $, = 'grep';
    Cut down a tree with a herring? Sure, I'll try, but only if it's red...

    ====

    If I had been running the modified signature as I, um, modified it, I would've caught my mistake sooner. As it is, vladb's misdirection waylaid me for an hour (actually, I gave up, but while eating lunch figured out my mistake). But this time gave me a chance to read up on $" and $, in perlvar.

    I'm a big fan of $", actually; I do this in oneliners a lot:  $"=$/; print "@a\n"; which prints the elements of @a on their own line. ($/, incidentally, is the "input record separator"; the default character is \n.)

    But I don't use $, very much at all. This turns out to be useful as well: if one has code such as $, = '|'; print $foo, $bar, $baz, "\n" then one can generate nicely formatted (in this case, pipe-delimited) lines without having to muck around with the equivalent printf statement.

    ====

    4. Code re-write, using discovery above

    1: $"= 'grep' ; 2: $,= 'grep' ; 3: 4: for(`find . -name ".saves*~"`){ 5: s;$/;;; 6: /(.*-(\d+)-.*)$/; 7: $_=["ps -e -o pid | "," $2 | "," -v "," "]; 8: `@$_` ? {print"+ $1"} : {print"- $1"} && `rm $1`; 9: print $\; 10: }
    ====

    Whew! At this point, we've only looked at the first two lines of code! Fortunately, lines 4-7 are fairly straightforward.

    Line 4: Setting $" and $, to 'grep' is a clue that vladb's signature is a Unix utility of some sort; the for ( `find ...`) clinches it. (Uhh, not to mention the original description!)

    line 4 runs a shell command (the Unix command "find") and foreach line that is returned, processes them according to lines 5-9.

    This particular find command is going to search the current directory (and, for some implementations of find, subdirectories) for files that match a particular naming convention. The regex for these filenames would be something like /^\.saves.*?~$/, if that helps you. Otherwise, here's a few examples:

    
       foo.saves_blah~   # no match
       .saves_foo        # no match
       .saves_foo~       # match!
    
    
    On Unix and Linux, a filename that starts with a dot (.) is a "hidden" file, which can only be seen if you use an extra flag on 'ls' (same function as DOS 'dir' command). So the find command is going to find a bunch of "hidden" files that start '.saves', continue with whatever text describes what file is saved, and end with a tilde (~). An example might be .saves_Big_Project_backup_27~

    Chances are you don't have any of these in your directory on your machine, so the find command would return nothing. And with no data to apply the for block to, perl just skips the block in totum.

    ====

    Well, that's pretty boring stuff. I wonder what happens when vladb uses this tool on his machine? Presumably the find command returns some data, so lines 5-9 get to kick in.

    Line 5: I didn't bother reformatting this; we can do so now. s;$/;;; In a substitution (s///), one can choose an alternate delimiting character. This is useful if you have a lot of '/' that you are processing, and find yourself escaping them all the time: '\/'. Consider if you wanted to remove all '//' from a line:

    s/\/\///g;

    versus

    s,//,,g;

    Notice how much cleaner the second form is.

    vladb is doing the same thing: using an alternate delimiter on his s///. He's using ';', though, because he figures that he might be able to catch overzealous deobfuscators out a second time (remember the "a lamo"!) But we're on to his semicolon madness, and know immediately that line 5 is globally removing all $/ characters from $_ - and since $/ defaults to \n, and since vladb hasn't changed it, we know we're really removing all newline characters from $_. find is only going to return one newline per line of output - this makes sense - so really line 5 is the same as chomp;

    ====

    Line 6 is a simple pattern match: perl actually lets you comment your regexes if you want, so let's try that out.

    /(.*-(\d+)-.*)$/; becomes

    / # start of pattern match ( # begin storing into $1 .* # store any number of any character... - # ...followed by a hyphen... ( # begin storing into $2 \d # ...any digit... + # ...as many as we can grab... ) # stop storing into $2 - # ...followed by another hyphen .* # ...followed by any number of any character.. +. ) # stop storing into $2 $ # end of the line, bub /x; # / to terminate regex, x to allow comments
    Right away this tells me that I'd misguessed the naming convention that vladb is using: my previous example, .saves_Big_Project_backup_27~, wouldn't have succeeded at all: the regex says there must be a hyphen, some digits, and a hyphen; the example actually doesn't have any hyphens surrounding the digits. (Oh well, the example served its purpose: to get me thinking about the data.)

    The naming convention is probably .saves-$$-~ where "$$" is the process id number of the program that created the save file. Putting the process id, or pid, into a temporary file's name is useful for two reasons: first, generally your OS doesn't cycle pids very quickly, so it's a lazy way of making sure your temp file names are unique; second, you can identify the owner of the temp file, and if the owner isn't running anymore, you can remove the old file.

    (Which, if you read vladb's description, is exactly what this utility does!)

    ====

    Line 7 made my eyes water. It looks like a shell command is being built, but to do what? Remember that line 6 stuffed a pid into $2. Line 7 is going to use that stored data and build a ps command that checks whether that pid is still around.

     $_=["ps -e -o pid | "," $2 | "," -v "," "];

    First off, we've got what I call "the anonymous array square brackets". (It ain't catchy but it sure helps me remember what they do.)

    If we de-obfuscate this line a bit, it becomes:

    @command = "ps -e -o pid | grep $2 | grep -v grep "; $ar_command = \@command;
    Where did those 'grep's come from? Remember back to line 2:
     $, = 'grep';

    So where you see a comma in line 7, you can mentally think "grep" instead.

    But what does the @command do? Let's look.

    ps -e -o pid  # use the 'ps' command to look at the process stack;
                  # the -e flag says to look at all running processes;
                  # the '-o pid' flag specifies to return their process ids.
    
    |             # take the output from the previous command and use it
                  # as input for this next command
    
    grep $2       # look for the pid that we found in line 6; this pid,
                  # remember, comes from the tempfile name, and tells us
                  # who the owner of $_ is.
    
    |             # take the output from the previous command and use it
                  # as input for this next command
    
    grep -v grep  # Right now there might be two lines in the process stack
                  # that have $2 in them: first is our grep line from earlier
                  # in this pipeline; second is the process whose pid really
                  # is $2. We want to ignore the grep lines; this way we avoid
                  # a situation where we see $2 in the process stack and think
                  # it's the process we're looking for when really it's just us!
    
    In line 8 vladb will actually run this command; for now if you only take one thing away from this, it should be this: the output of the command will be either 0 lines of data, in which case the process isn't running, or it will be 1 line of data, in which case the process still is running.

    Remember, though, that vladb didn't want to give away the whole bag at once, so instead of writing:

    $_ = "ps -e -o pid | grep $2 | grep -v grep";

    he instead wrote

    $_=["ps -e -o pid | "," $2 | "," -v "," "];

    And one of the consequences of this is that $_ isn't actually the full command that we want; it's a pointer to an anonymous array- the anonymous array is what contains the real command!p> ====

    So in Line 8, when vladb actually want to check the process stack for those running processes, he must first dereference the array.

    As TheDamian wrote in an oldish article archived at perl.com, ...A reference is like the traditional Zen idea of the "finger pointing at the moon". It's something that identifies a variable, and allows us to locate it. And that's the stumbling block most people need to get over: the finger (reference) isn't the moon (variable); it's merely a means of working out where the moon is.

    (n/b if you haven't searched perl.com for your favorite authors and personalities that hang out on perlmonks: why haven't you? Many have written articles that will improve your understanding and use of perl almost within seconds of reading!)

    The dereferencing is done in line 8 by simply tossing an at-sign, @, in front of $_.

    Like line 4, line 8 uses backticks `` to run an external command and feed its output back into the program. We know from the discussion of line 7 what the command is - a search of running process IDs - and what the expected output is (either nothing or a process ID).

    Line 8 also uses a ternary conditional: this is a fancy way of writing an if-else statement in just one line.

    Consider:

    perl -e '$foo = 0; $foo==0 ? print "foo is zero" : print "foo is non-z +ero";' Output: foo is zero

    This is the same as:

    $foo = 0; if ( $foo == 0 ) { print "foo is zero" ; } else { print "foo is non-zero" ; }
    We can re-write line 8 a little:
    if ( `@$_` ) { print "+ $1"; } else { print "- $1" && `rm $1`; }
    And we can re-write it a little more:
    my $owner_is_still_running = `@$_`; # search for a specifi +c $pid if ( $owner_is_still_running ) { print "$owner_is_still_running, keeping $1"; # found $pid, keep tem +pfile } else { print "removing $1"; # didn't find $pid `rm $1`; # remove $pid's tempfi +le }
    ====

    Line 9 prints $\, which, umm, defaults to nothing; here it looks like it's being treated as a newline, though, doesn't it? I've gotta admit: I'm not sure where $\ gets set to \n...

    ====

    Another thing I'm not sure of is why $" was set to 'grep'; this seems like a bit of misdirection on vladb's behalf. After all, he only builds one array - in line 7 - and never double-quotes it. So as far as I can tell, $" never gets used.

    ====

    And for my own fun, here's the de-obfuscated tool.

    #!/usr/bin/perl use strict; use warnings; # find files that match the naming convention my @files = `find . -name ".saves*~"`; foreach ( @files ) { chomp; # hold onto filename, and extract creator's pid / # start of pattern match ( # begin storing into $1 .* # store any number of any character... - # ...followed by a hyphen... ( # begin storing into $2 \d # ...any digit... + # ...as many as we can grab... ) # stop storing into $2 - # ...followed by another hyphen .* # ...followed by any number of any character... ) # stop storing into $2 $ # end of the line, bub /x; # / to terminate regex, x to allow comments my ( $filename, $creator_pid ) = ( $1, $2 ); # Check process stack for the creator's pid, storing command result my $command = "ps -e -o pid | grep $creator_pid | grep -v grep"; my $command_result = `$command`; # if the command result is positive, leave $filename alone... # ... otherwise, remove $filename if ( $command_result ) { print "+ $filename\n"; } else { print "- $filename\n"; `rm $filename`; } }
    ====

    Summary

    Hopefully this will be useful to some other monks as an example of how to start de-obfuscating. This is my first turn at writing a spoiler, and I gotta admit: it was pretty fun to figure this stuff out. Although (because?) I made a few wrong turns in my assumptions about the code, this exercise also helped me learn a little bit more about Perl. Thanks jmcnamara for the thread and vladb for the spoiler opportunity.

    blyman
    setenv EXINIT 'set noai ts=2'

      Line 9 prints $\, which, umm, defaults to nothing; here it looks like it's being treated as a newline, though, doesn't it? I've gotta admit: I'm not sure where $\ gets set to \n...

      It looks like it's being treated as a newline... but $\ isn't being set to \n in this code. So when we execute this code it prints out the empty string. Perhaps vladb has typoed this, because the resulting print out is a little confusing without the newlines.

      [me]$ ls -a .sav* .saves-19639-~ .saves-19896-~ .saves-19896333-~ .saves-22-~ [me]$ perl tmp + ./.saves-22-~- ./.saves-19896333-~+ ./.saves-19639-~- ./.saves-19896 +-~[me]$
      You'd notice the uninitialised values if you ran the code with warnings turned on:
      [me]$ perl -w tmp Useless use of single ref constructor in void context at tmp line 2. Odd number of elements in hash assignment at tmp line 2. Use of uninitialized value in print at tmp line 3. + ./.saves-22-~Odd number of elements in hash assignment at tmp line 2 +. - ./.saves-19896-~Use of uninitialized value in print at tmp line 3. Odd number of elements in hash assignment at tmp line 2. Use of uninitialized value in print at tmp line 3. + ./.saves-19639-~Odd number of elements in hash assignment at tmp lin +e 2. - ./.saves-19896333-~Use of uninitialized value in print at tmp line 3 +.
      If, of course, you can see them through the other mess of warnings. Why are we getting these hash assignment warnings? Well, we can isolate the code that's causing them, it's the ternary operator:
      `@$_`?{print"+ $1"}:{print"- $1"}&&`rm $1`;
      Perhaps Perl doesn't like blocks here? Or doesn't in my version of Perl (v5.6.1 built for i386-linux). We can't just remove them though, because if we do then we won't print out the file names that we delete.

      I can't explain the error messages but I can give an alternate line that doesn't create them:

      `@$_`?print"+ $1":print("- $1")&&`rm $1`;
      although of course your replacement is much neater. When we run our changed versions with warnings we get a slightly cleaner result:
      [me]$ perl -w tmp Use of uninitialized value in print at tmp line 3. + ./.saves-22-~- ./.saves-19896333-~Use of uninitialized value in prin +t at tmp line 3. + ./.saves-19639-~- ./.saves-19896-~Use of uninitialized value in prin +t at tmp line 3.
      And then any of the following lines can remove the remaining warning:
      print$\; # cause of last warning #replacements print$/; print"\n"; print qq'\n'; #etc.
      of course we could put the line $\=$/; somewhere before the for loop and then not even need this print$\; line at all.

      With it all cleaned up it works quite nicely

      [me]$ perl tmp + ./.saves-22-~ - ./.saves-19896333-~ + ./.saves-19639-~ - ./.saves-19896-~ [@me]$
      Anyway, thanks for a great obfu spoiler, it's fun.

      jarich

Re: Deobfuscation for fun and profit
by Abigail-II (Bishop) on May 28, 2002 at 12:12 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://169553]
Approved by virtualsue
Front-paged by virtualsue
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (13)
As of 2014-04-18 21:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (472 votes), past polls