Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re^2: Bits & pieces

by bobf (Monsignor)
on Jul 18, 2005 at 07:58 UTC ( [id://475665]=note: print w/replies, xml ) Need Help??

in reply to Re: Bits & pieces
in thread Bits & pieces

Good job! Included below is an explanation of this JAPH which is a bit (ba-dum-bum) more verbose. Your analysis hit on the big points, but I wanted to fill in some of the more subtle reasoning.

You correctly identified the main point of this JAPH, which is the overlapping, 3-bit offsets for the low nybbles. While the subroutine "is essentially pure obfuscation" (as are most obfus!), the closure was designed deliberately and the calculations are far from random. The "fudge factor" at the end was necessary, but the value of it is meaningful, given the operation that uses it.

The point

The JAPH text is encoded in bitstrings, which is probably obvious by just glancing at the code. The fun part of writing this JAPH was constructing methods to store and extract the data that were less-than-obvious. The challenge of this JAPH was not meant to be in figuring out where the data comes from, but in how all the bits and pieces come together.

The Bitshifts

The bitshift operations were included because I thought it would be fun to restrict the digits used to only 0 and 1. Debuggers (or deobfuscators) will resolve them easily, but stepping through the code manually could be tricky unless you've got the table of operator precedence handy. Since they weren't the main point of this obfuscated JAPH, I removed them from the discussion below.

The Data Structure

The mess of hash and array references that were built and assigned to $_ contain the data hash as well as a bunch of padding so the whole thing fit into a nice block. The first hash, which contains keys 0111 and 1000, is never used. The second hash is the real data hash and it contains both bitstrings and bytestrings. The data hash (%_) is assigned by %_ = %{${${$_}[0]}[1]}, which simplifies to $_->[0][1].

The keys to the data hash were meant to look like the low nybble of binary 1 (0001), 2 (0010), ... 6 (0110). I hoped that someone walking through this manually (without a debugger) would forget that the leading 0 would force the keys to be interpreted as octals rather than as binaries. Therefore, the actual key values (and the result of the bitshifts in the code) are 1, 8, 9, 64, 65, and 72, which might be confusing if you thought the hash contained keys of only (1..6).

As described below, the bitstring in $_{9} is composed of the low 4 bits (0..3, the low nybble) of the bytes in 'just another perl hacker'. The bitstring in $_{65} is composed of all of the 4th bits in the string. Note bit 5 is always set for these characters, bit 6 is set for every non-space character, and bit 7 is not set for any character.

The Closure

The result of the first eval is actually a closure, which is assigned to $s. The closure (with the variables renamed to something more appropriate) looks like this:

$s = eval { $index = 0; # $O $commabyte = 0; # $C $textstr = 'This is my 100th PM post'; # $t sub{ $index++; $commabyte = @_ ? $commabyte : eval pack( 'b*', vec( pack( 'b*', $_{64} ), $index-1, +1 ) ? $_{1} : $_{8} ) . $commabyte . pack( 'c', vec( pack( 'b*', $_{64} ), ($index-1)%3 +, 8 ) ) . pack( 'b*', vec( pack( 'b*', $_{64} ), $index-1, 1 + ) ? $_{8} : $_{1} ) . vec( $textstr, $index-1, 8 ); }; };

The index ($O) simply tracks the iteration number, and $commabyte is processed during each iteration and will eventually become the comma at the end of the JAPH. The text string is just something fun I stuck in to commemorate my 100th post here at PM, but the characters within that string are used in bit operations with $commabyte each time the closure is called.

The value of $_{64} ('011001000011111001111010') is a bytestring representing three bitwise operations ('&|^'). $_{1} and $_{8} represent '~' and '' (the empty string), respectively. Therefore, the 5 parts to the eval simplify to:

1: pack( 'b*', vec( '&|^', $index-1, 1 ) ? '~' : '' ) 2: $commabyte 3: pack( 'c', vec( '&|^', ($index-1)%3, 8 ) ) 4: pack( 'b*', vec( '&|^', $index-1, 1 ) ? '' : '~' ) 5: vec( $textstr, $index-1, 8 ) )

Lines 2 and 5 are the two bytes used in the operation: the current status of $commabyte (retained between calls by the closure) and one character (at position $index-1) from $textstr.

Line 3 determines which of the 3 bit operators is used during the current iteration (using %3 to find the proper offset in the string).

In lines 1 and 4, vec( '&|^', $index-1, 1 ) grabs one bit (at position $index-1) from the bytestring represented by '&|^'. That bit is used to determine whether the bitwise negation operator ('~') or an empty character is used in the string to be eval'd. Since lines 1 and 4 have the characters reversed in trinary conditional operator, the effect is to negate the first term in one iteration and the second term in the next. if the bit is set and the second term if it is not.

The resulting string looks something like this:

[~](char) (operator) [~](char)

where one of the two characters are negated in alternate calls to the closure..

The Compressed Low Nybble Bitstring

$_{9} represents a bitstring that was constructed from the low nybble of each character (except for the trailing comma) in the JAPH. The low nybbles aren't simply concatenated, however. Instead, they overlap each other by one bit, so the "reading frame" of each nybble is defined by an offset that is a multiple of 3 characters. For example, here is the result for the first 4 characters (we'll worry about capitalization later):

bit: 01234567 low nybbles j = 01010110 0101 u = 10101110 1010 s = 11001110 1100 t = 00101110 0010

The resulting bitstring is 0101010100010, but notice that the overlapping bits for 'u' and 's' do not match. As it turns out this inversion occurs for every third pair with the exception of the 12th and the 24th, and also for the 22nd pair. There is an adjustment in the loop to handle this.

The compressed low nybble bitstring is assigned to $o just before the main loop.

The Main Loop

The for loop runs for 24 cycles (once for each character in the JAPH, with the exception of the trailing comma). The upper limit for the iterator is calculated using the '%b*' format for unpack, which is an efficient method for counting the number of set bits. It was convenient that the compressed low nybble bitstring has 25 set bits, so subtracting 2 gives the proper number of iterations.

The low nybble for each character is extracted from the bitstring in $_{9}, but since the offsets are multiples of 3 and vec can only use offsets that are powers of 2, the process is a bit more complex. Specifically, the extraction must accomodate nybbles that cross frame boundaries. This is not possible with a single vec function (unless the frame width (BITS in vec) is set to a value at least as large as the length of the whole bitstring), so two 4-bit extractions are used.

The offset of the first 4-bit frame ($c) is calculated and then vec is used to extract 4 bits at that and the following offset. The second nybble is left-shifted 4 to make it a high nybble, and the two are added to get a byte. Finally, since the desired nybble is located within that byte at an offset of 0..3 bits, the byte is shifted by that many bits to eliminate the extra bits.

In the code below, the loop iterator ($_) is designated by $offset and the compressed bitstring ($_{9}) is represented by $bitstr.

$c = $offset - ( ($offset - ($offset % 4))/4 + ($offset % 4 ? 1 : 0) ) +; # first frame $i = ( (vec( $bitstr, $c, 4 ) + # low nybble (vec( $bitstr, $c + 1, 4 )<<4)) # high nybble >> 4 - ($offset % 4) - 4 * ( ($offset % 4) ? 0 : 1 ) # shift o +ff extra bits );

At this point $i is a byte with the proper bits in the low nybble, but the low bit of the low nybble might need to be flipped (see the explanation of the compressed low nybble bitstring, above). This is tested and accomplished with the line:

$i += $i%2 ? -1 : 1 if( ( (($offset + 1) % 12) && !(($offset + 1) % 3) ) || $offset + 1 == 22 );

Bit 4 (in the high nybble) is set by extracting the appropriate bit from the high bit string in $_{65}, and bit 5 is turned on for all characters. These are added to the low nybble (from $i) to give $T.

$T = vec( pack( 'C', $i ), 0, 4 ) + vec( pack( 'b*', $_{65} ), $offset, 1 ) * (1<<4) + (1<<5);

Bit 6 is set based on the value of $T. If the character should be a space, $T == 32 (only bit 5 is set) and no change is made, otherwise bit 6 is set. Finally, the 'j' and 'p' characters (the 0th and 13th characters in the JAPH) are capitalized by turning off bit 5, and the value is printed.

print pack( 'c', ($T += $T == (1<<5) ? 0 : (1<<6)) -= $offset % 13 == 0 ? (1<<5) : 0 );

The last step in the main loop is to call the closure (see above).

The comma

The comma at the end of the JAPH is constructed in the closure, but before it is printed it must be XOR'd with 'X' (which I thought was interesting). The final value of the comma byte is retrieved from the closure by calling it with a parameter, and 'X' is stored as a bitstring in $_{72}.

Update: corrected explanation of the bitwise negation in the closure (thanks jdalbec). My original version of this obfu used the value of $O (the index) %2 to determine which term gets negated, and I forgot that I changed it to rely on the bitstring instead. I guess that's what I get for writing a post at 3am. :)

Replies are listed 'Best First'.
Re^3: Bits & pieces
by jdalbec (Deacon) on Jul 19, 2005 at 03:01 UTC
    the two characters are negated in alternate calls to the closure.
    For that to be true, $_{64} would have to be '010101010101010101010101'. But I see that the output of the closure is more predictable than I had thought at first. The final result depends only on the last three (to some extent, six) characters of the string (in fact it's the same as the last character of the string).
    $O = 19, $C = ~$C& 0x4D; # bits 7+ all 0 $O = 20, $C = ~$C| 0x20; # bits 7+ all 1 $O = 21, $C = ~$C^ 0x70; # bits 7+ all 0 bits 6543210 $O = 22, $C = $C&~0x6F; # bits 7+ all 0 00?0000 $O = 23, $C = ~$C| 0x73; # bits 7+ all 1 1111111 $O = 24, $C = $C^~0x74; # bits 7+ all 0 1110100
    I see also that I was bitten by contexts again. The correct list of values for &$s in scalar context is
    # 0 4294967295 105 8 4294967263 73 8 4294967263 77 4 4294967291 +53 # 0 4294967295 116 20 4294967263 112 13 4294967282 125 16 4294967295 1 +16

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://475665]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2024-06-16 14:33 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.