good chemistry is complicated, and a little bit messy -LW |
|
PerlMonks |
Re^2: Bits & piecesby bobf (Monsignor) |
on Jul 18, 2005 at 07:58 UTC ( [id://475665]=note: print w/replies, xml ) | Need Help?? |
Good job! Included below is an explanation of this JAPH which is a bit (ba-dum-bum) more verbose. Your analysis hit on the big points, but I wanted to fill in some of the more subtle reasoning. You correctly identified the main point of this JAPH, which is the overlapping, 3-bit offsets for the low nybbles. While the subroutine "is essentially pure obfuscation" (as are most obfus!), the closure was designed deliberately and the calculations are far from random. The "fudge factor" at the end was necessary, but the value of it is meaningful, given the operation that uses it.
The point
The Bitshifts
The Data Structure
The keys to the data hash were meant to look like the low nybble of binary 1 (0001), 2 (0010), ... 6 (0110). I hoped that someone walking through this manually (without a debugger) would forget that the leading 0 would force the keys to be interpreted as octals rather than as binaries. Therefore, the actual key values (and the result of the bitshifts in the code) are 1, 8, 9, 64, 65, and 72, which might be confusing if you thought the hash contained keys of only (1..6). As described below, the bitstring in $_{9} is composed of the low 4 bits (0..3, the low nybble) of the bytes in 'just another perl hacker'. The bitstring in $_{65} is composed of all of the 4th bits in the string. Note bit 5 is always set for these characters, bit 6 is set for every non-space character, and bit 7 is not set for any character.
The Closure
The index ($O) simply tracks the iteration number, and $commabyte is processed during each iteration and will eventually become the comma at the end of the JAPH. The text string is just something fun I stuck in to commemorate my 100th post here at PM, but the characters within that string are used in bit operations with $commabyte each time the closure is called. The value of $_{64} ('011001000011111001111010') is a bytestring representing three bitwise operations ('&|^'). $_{1} and $_{8} represent '~' and '' (the empty string), respectively. Therefore, the 5 parts to the eval simplify to:
Lines 2 and 5 are the two bytes used in the operation: the current status of $commabyte (retained between calls by the closure) and one character (at position $index-1) from $textstr. Line 3 determines which of the 3 bit operators is used during the current iteration (using %3 to find the proper offset in the string).
In lines 1 and 4, vec( '&|^', $index-1, 1 ) grabs one bit (at position $index-1) from the bytestring represented by '&|^'. That bit is used to determine whether the bitwise negation operator ('~') or an empty character is used in the string to be eval'd. Since lines 1 and 4 have the characters reversed in trinary conditional operator, the effect is to negate the first term The resulting string looks something like this:
where one of the two characters are negated
The Compressed Low Nybble Bitstring
The resulting bitstring is 0101010100010, but notice that the overlapping bits for 'u' and 's' do not match. As it turns out this inversion occurs for every third pair with the exception of the 12th and the 24th, and also for the 22nd pair. There is an adjustment in the loop to handle this. The compressed low nybble bitstring is assigned to $o just before the main loop.
The Main Loop
The low nybble for each character is extracted from the bitstring in $_{9}, but since the offsets are multiples of 3 and vec can only use offsets that are powers of 2, the process is a bit more complex. Specifically, the extraction must accomodate nybbles that cross frame boundaries. This is not possible with a single vec function (unless the frame width (BITS in vec) is set to a value at least as large as the length of the whole bitstring), so two 4-bit extractions are used. The offset of the first 4-bit frame ($c) is calculated and then vec is used to extract 4 bits at that and the following offset. The second nybble is left-shifted 4 to make it a high nybble, and the two are added to get a byte. Finally, since the desired nybble is located within that byte at an offset of 0..3 bits, the byte is shifted by that many bits to eliminate the extra bits. In the code below, the loop iterator ($_) is designated by $offset and the compressed bitstring ($_{9}) is represented by $bitstr.
At this point $i is a byte with the proper bits in the low nybble, but the low bit of the low nybble might need to be flipped (see the explanation of the compressed low nybble bitstring, above). This is tested and accomplished with the line:
Bit 4 (in the high nybble) is set by extracting the appropriate bit from the high bit string in $_{65}, and bit 5 is turned on for all characters. These are added to the low nybble (from $i) to give $T.
Bit 6 is set based on the value of $T. If the character should be a space, $T == 32 (only bit 5 is set) and no change is made, otherwise bit 6 is set. Finally, the 'j' and 'p' characters (the 0th and 13th characters in the JAPH) are capitalized by turning off bit 5, and the value is printed.
The last step in the main loop is to call the closure (see above).
The comma
Update: corrected explanation of the bitwise negation in the closure (thanks jdalbec). My original version of this obfu used the value of $O (the index) %2 to determine which term gets negated, and I forgot that I changed it to rely on the bitstring instead. I guess that's what I get for writing a post at 3am. :)
In Section
Obfuscated Code
|
|