There's more than one way to do things PerlMonks

### Bits & pieces

by bobf (Monsignor)
 on Jul 12, 2005 at 07:08 UTC Need Help??

This started as a test script while I was learning about bit operations, and it slowly evolved into an obfuscated JAPH. I know it exceeds the traditional "4 line rule", but I thought that it would be fun to limit the digits used to only 0 and 1, given the theme. It was not golfed in any way.

Enjoy!

```
\$_=[[[[{},[]],{0111,'01110111',1000,'11100101'},[[]]],{0001,'01111110'
,0110,'00011010',0100,'011001000011111001111010',0010,'00000100',0011,
'01010101000100000001111110100010101000000000101000110000010001001010'
.'10100',0101,'011100001001010100000001'},[[]]]];%_=%{\${\${\$_}[0]}[1]};
\$s=eval{\$O=0,\$C=0,\$t='This is my 100th PM post';sub{\$O++,\$C=@_?\$C:eval
pack('b*',vec(pack('b*',\$_{1<<(1<<0<<1)*((1<<0+1<<0)+1<<0)}),\$O-1,1<<0
)?\$_{1<<0}:\$_{1<<(1<<1<<1)-1}).\$C.pack('c',vec(pack('b*',\$_{1<<((1<<0)
+(1<<1*1<<1)+1<<0)}),(\$O-1)%((1<<0*1<<1)+1<<0),1<<(1<<0+1<<0)+1)).pack
('b*',vec(pack('b*',\$_{1<<((1<<0+1<<0)+(1<<1<<1))}),\$O-1,1)?\$_{1<<((1+
1<<0)+1)}:\$_{1<<0}).vec(\$t,\$O-1,1<<(1<<1<<1)-(1<<0))};};;\$o=pack('b*',
\$_{((1<<1)**(1<<1)-1)**(1<<1)});for(0..unpack('%b*',\$o)-(1<<1)){\$c=\$_-
((\$_-\$_%(1<<1<<1))/(1<<1<<0*1<<1)+(\$_%(1<<1+1<<0)?1:0));;\$i=(vec(\$o,\$c
,(1<<1<<0+1<<0))+(vec(\$o,\$c+1,(1<<1<<1))<<(1<<1<<0+1))>>((1<<1<<1<<1>>
1)-\$_%(1<<1<<1<<1>>1)-(1<<1<<0+1)*(\$_%(1<<1<<1>>1<<1)?0:1)));\$i+=\$i%(1
<<1)?-1:1 if(((\$_+1)%((1<<1+1<<1)+(1<<1)+(1<<1)))&&!((\$_+1)%((1<<1)**(
1<<1)-1))||\$_+1==(1<<(1<<1+1<<1>>1))+(((1<<1)+1)<<1));\$T=vec(pack('C',
\$i),0,(1<<1<<1>>1<<1))+vec(pack('b*',\$_{(1+((1<<1)**(1<<1))**((1<<1)**
(1<<1)-1))}),\$_,1)*(1<<(1<<(1<<1)))+(1<<((1<<1**1<<1)+1));;print pack(
'c',(\$T+=\$T==(1<<((1<<1+1<<1>>1)+1))?0:(1<<(((1<<1+1<<1)>>1)+(1<<0*1<<
1))))-=\$_%(1+(((1<<1+1<<1)>>1)+(1<<0*1<<1)<<1))==0?(1<<(((1<<1*1<<1<<1
)+(1<<1<<1>>1))>>1)):0);&\$s}print pack('c',\$s->(1<<0)^unpack('c',(pack
('b*',\$_{(1<<1*1<<1*1<<1)*((1<<1)**(1<<0<<1)-1)**(1<<1<<1>>1<<0)}))));

Hint: the bit shifts are fun, but they aren't the main point of the obfuscation.

Tested on

• perl 5.005_03 built for i386-linux (little endian)
• perl 5.6.0 built for sun4-solaris (big endian)
• perl 5.6.1 built for MSWin32-x86-multi-thread (little endian)
• perl 5.8.0 built for i386-linux-thread-multi (little endian)
• perl 5.8.3 built for i686-linux (little endian)
• perl 5.8.4 built for i386-linux-thread-multi (little endian)

Replies are listed 'Best First'.
Re: Bits & pieces
by jdalbec (Deacon) on Jul 16, 2005 at 03:18 UTC
B::Deparse gets rid of the bit shifts, but it breaks the code by omitting the parentheses after not below. After reinstating the parentheses and running the code through Perl::Tidy it's still not very readable. I've inserted some comments inside and outside the code to keep track of what's going on. Comments generally refer to the statements preceding them.
```\$_ = [
[
[ [ {}, [] ], { 73, '01110111', 1000, '11100101' }, [ [] ] ],
{
1,
'01111110',
72,
'00011010',
64,
'011001000011111001111010',
8,
'00000100',
9,
'010101010001000000011111101000101010000000001010001100000100010010101
+0100',
65,
'011100001001010100000001'
},
[ [] ]
]
];
(%_) = %{ \${ \${ \$_; }[0]; }[1]; };
Most of the structure of \$_ is obfuscation. Let's start over with %_ since \$_ is never used again except for the localized \$_ in the for loop.
```undef \$_;
%_ = (
1 => '01111110', # '~'
72 => '00011010', # 'X'
64 => '011001000011111001111010', # '&|^'
#                  012345678901234567890123
8 => '00000100', # ' '
9 =>
'010101010001000000011111101000101010000000001010001100000100010010101
+0100',
#012345678901234567890123456789012345678901234567890123456789012345678
+9012
#          1         2         3         4         5         6
+ 7
65 => '011100001001010100000001'
#                  012345678901234567890123
);
Some of these (little-endian) bitstrings are really bytestrings and I've commented them as such. I've also added column numbers below some of the strings.
```\$s = eval {
do {
\$O = 0, \$C = 0, \$t = 'This is my 100th PM post';
#                             012345678901234567890123
sub {
\$O++, \$C = @_ ? \$C : eval
# { do { my \$k =
pack( 'b*',
vec( pack( 'b*', \$_{64} ), \$O - 1, 1 ) ? \$_{1} : \$_{8}
+ )
. \$C
. pack( 'c', vec( pack( 'b*', \$_{64} ), ( \$O - 1 ) % 3,
+8 ) )
. pack( 'b*',
vec( pack( 'b*', \$_{64} ), \$O - 1, 1 ) ? \$_{8} : \$_{1}
+ )
. vec( \$t, \$O - 1, 8 );
# print " \$k"; eval \$k; } }
}

}
};
This subroutine is essentially pure obfuscation. It does some fairly random calculations using the characters of \$t and a fudge factor is added at the end to produce the desired output.
```\$o = pack( 'b*', \$_{9} );
foreach \$_ ( 0 .. unpack( '%b*', \$o ) - 2 ) {
# 16 bit checksum = 25, - 2 = 23
Now we come to the heart of it. It's rather convenient that a 16-bit checksum of \$o comes out so close to the length.
```    \$c = \$_ - ( ( \$_ - \$_ % 4 ) / 4 + ( \$_ % 4 ? 1 : 0 ) );
# print \$c;
# 0 0 1 2 3 3 4 5 6 6 7 8 9 9 10 11 12 12 13 14 15 15 16 17
\$i =
vec( \$o, \$c, 4 ) + ( vec( \$o, \$c + 1, 4 ) << 4 ) >> 4 - \$_ % 4 -
+ 4 *
( \$_ % 4 ? 0 : 1 );
# >> 0 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1
# print \$i;
# 170 21 34 4 128 16 62 47 69 8 21 2 0 0 20 98 12 1 8 17 82 10 21 2
# print \$i % 16; # == vec( pack( 'C', \$i ), 0, 4 )
# 10 5  2  4  0  0  14 15 5  8  5  2  0  0  4  2  12 1  8  1  2  10 5
+ 2
# 0101  0100  0000  0111  1010  1010  0000  0010  0011  0001  0100  10
+10

#    1010  0010  0000  1111  0001  0100  0000  0100  1000  1000  0101
+ 0100
# J  u  r  t        n  o  u  h  e  r     P  d  r  l  a  h  a  b  j  e
+ r
#       *        *        *                 *        *        *  *
Heavy comments here. These two statements extract the low nybbles of the JAPH phrase from \$o. The trick is that the nybbles are offset by only 3 bits so the 8's bit of each character is the 1's bit of the next character. Naturally, not all the 1's bits are correct, but surprisingly many are. Also, some of the higher-order bits in \$i are set, but they get stripped off later.
```    \$i += \$i % 2 ? -1 : 1 # \$i ^= 1
if ( \$_ + 1 ) % 12 and not(( \$_ + 1 ) % 3)
or \$_ + 1 == 22;
# print \$i;
# 170 21 35 4 128 17 62 47 68 8 21 2 0 0 21 98 12 0 8 17 83 11 21 2
# print \$i % 16;
# 10 5 3 4 0 1 14 15 4 8 5 2 0 0 5 2 12 0 8 1 3 11 5 2
# J  u s t   a n  o  t h e r   P e r l    h a c k  e r
Here we fix up the erroneous 1's bits.
```    \$T =
vec( pack( 'C', \$i ), 0, 4 ) + vec( pack( 'b*', \$_{65} ), \$_, 1
+) * 16 +
32;
# print \$T;
# 42 53 51 52 32 33 46 47 52 40 37 50 32 48 37 50 44 32 40 33 35 43 37
+ 50
# 0  1  1  1  0  0  0  0  1  0  0  1  0  1  0  1  0  0  0  0  0  0  0
+ 1
# J  u  s  t     a  n  o  t  h  e  r     P  e  r  l     h  a  c  k  e
+ r
# *  1  2  3  4  5  6  7  8  9  10 11 12 *  1  2  3  4  5  6  7  8  9
+ 10
Now we extract the low nybbles from \$i and the 16's bits from \$_{65}, and set the 32's bits unconditionally.
```    print pack( 'c', ( \$T += \$T == 32 ? 0 : 64 ) -= \$_ % 13 == 0 ? 32
+: 0 );
Now we set the 64's bit unless the character is a space and reset the 32's bit for the two capital letters and print the result.
```#    &\$s; # independent of rest of loop, unrolled
}
# print ' ',
&\$s for 0..23;
# 00   14294967295  2105  38   44294967263  573   68   74294967263  87
+7
# 94   104294967291 1153  120  134294967295 14116 1520 164294967263 17
+112
# 1813 194294967282 20125 2116 224294967295 23116
As I mentioned above the calls to &\$s are essentially pure obfuscation. In the last statement below a fudge factor \$_{72} is xored with the final result to get a comma.
```print pack( 'c', &\$s(1) ^ unpack( 'c', pack( 'b*', \$_{72} ) ) );

Good job! Included below is an explanation of this JAPH which is a bit (ba-dum-bum) more verbose. Your analysis hit on the big points, but I wanted to fill in some of the more subtle reasoning.

You correctly identified the main point of this JAPH, which is the overlapping, 3-bit offsets for the low nybbles. While the subroutine "is essentially pure obfuscation" (as are most obfus!), the closure was designed deliberately and the calculations are far from random. The "fudge factor" at the end was necessary, but the value of it is meaningful, given the operation that uses it.

The point

The JAPH text is encoded in bitstrings, which is probably obvious by just glancing at the code. The fun part of writing this JAPH was constructing methods to store and extract the data that were less-than-obvious. The challenge of this JAPH was not meant to be in figuring out where the data comes from, but in how all the bits and pieces come together.

The Bitshifts

The bitshift operations were included because I thought it would be fun to restrict the digits used to only 0 and 1. Debuggers (or deobfuscators) will resolve them easily, but stepping through the code manually could be tricky unless you've got the table of operator precedence handy. Since they weren't the main point of this obfuscated JAPH, I removed them from the discussion below.

The Data Structure

The mess of hash and array references that were built and assigned to \$_ contain the data hash as well as a bunch of padding so the whole thing fit into a nice block. The first hash, which contains keys 0111 and 1000, is never used. The second hash is the real data hash and it contains both bitstrings and bytestrings. The data hash (%_) is assigned by %_ = %{\${\${\$_}[0]}[1]}, which simplifies to \$_->[0][1].

The keys to the data hash were meant to look like the low nybble of binary 1 (0001), 2 (0010), ... 6 (0110). I hoped that someone walking through this manually (without a debugger) would forget that the leading 0 would force the keys to be interpreted as octals rather than as binaries. Therefore, the actual key values (and the result of the bitshifts in the code) are 1, 8, 9, 64, 65, and 72, which might be confusing if you thought the hash contained keys of only (1..6).

As described below, the bitstring in \$_{9} is composed of the low 4 bits (0..3, the low nybble) of the bytes in 'just another perl hacker'. The bitstring in \$_{65} is composed of all of the 4th bits in the string. Note bit 5 is always set for these characters, bit 6 is set for every non-space character, and bit 7 is not set for any character.

The Closure

The result of the first eval is actually a closure, which is assigned to \$s. The closure (with the variables renamed to something more appropriate) looks like this:

```\$s = eval
{
\$index = 0; # \$O
\$commabyte = 0; # \$C
\$textstr = 'This is my 100th PM post'; # \$t

sub{
\$index++;
\$commabyte = @_ ? \$commabyte :

eval pack( 'b*', vec( pack( 'b*', \$_{64} ), \$index-1,
+1 ) ? \$_{1} : \$_{8} ) .
\$commabyte .
pack( 'c', vec( pack( 'b*', \$_{64} ), (\$index-1)%3
+, 8 ) ) .
pack( 'b*', vec( pack( 'b*', \$_{64} ), \$index-1, 1
+ ) ? \$_{8} : \$_{1} ) .
vec( \$textstr, \$index-1, 8 );
};
};

The index (\$O) simply tracks the iteration number, and \$commabyte is processed during each iteration and will eventually become the comma at the end of the JAPH. The text string is just something fun I stuck in to commemorate my 100th post here at PM, but the characters within that string are used in bit operations with \$commabyte each time the closure is called.

The value of \$_{64} ('011001000011111001111010') is a bytestring representing three bitwise operations ('&|^'). \$_{1} and \$_{8} represent '~' and '' (the empty string), respectively. Therefore, the 5 parts to the eval simplify to:

```1: pack( 'b*', vec( '&|^', \$index-1, 1 ) ? '~' : '' )
2: \$commabyte
3: pack( 'c', vec( '&|^', (\$index-1)%3, 8 ) )
4: pack( 'b*', vec( '&|^', \$index-1, 1 ) ? '' : '~' )
5: vec( \$textstr, \$index-1, 8 ) )

Lines 2 and 5 are the two bytes used in the operation: the current status of \$commabyte (retained between calls by the closure) and one character (at position \$index-1) from \$textstr.

Line 3 determines which of the 3 bit operators is used during the current iteration (using %3 to find the proper offset in the string).

In lines 1 and 4, vec( '&|^', \$index-1, 1 ) grabs one bit (at position \$index-1) from the bytestring represented by '&|^'. That bit is used to determine whether the bitwise negation operator ('~') or an empty character is used in the string to be eval'd. Since lines 1 and 4 have the characters reversed in trinary conditional operator, the effect is to negate the first term in one iteration and the second term in the next. if the bit is set and the second term if it is not.

The resulting string looks something like this:

```[~](char) (operator) [~](char)

where one of the two characters are negated in alternate calls to the closure..

The Compressed Low Nybble Bitstring

\$_{9} represents a bitstring that was constructed from the low nybble of each character (except for the trailing comma) in the JAPH. The low nybbles aren't simply concatenated, however. Instead, they overlap each other by one bit, so the "reading frame" of each nybble is defined by an offset that is a multiple of 3 characters. For example, here is the result for the first 4 characters (we'll worry about capitalization later):

```bit: 01234567  low nybbles
j =  01010110   0101
u =  10101110      1010
s =  11001110         1100
t =  00101110            0010

The resulting bitstring is 0101010100010, but notice that the overlapping bits for 'u' and 's' do not match. As it turns out this inversion occurs for every third pair with the exception of the 12th and the 24th, and also for the 22nd pair. There is an adjustment in the loop to handle this.

The compressed low nybble bitstring is assigned to \$o just before the main loop.

The Main Loop

The for loop runs for 24 cycles (once for each character in the JAPH, with the exception of the trailing comma). The upper limit for the iterator is calculated using the '%b*' format for unpack, which is an efficient method for counting the number of set bits. It was convenient that the compressed low nybble bitstring has 25 set bits, so subtracting 2 gives the proper number of iterations.

The low nybble for each character is extracted from the bitstring in \$_{9}, but since the offsets are multiples of 3 and vec can only use offsets that are powers of 2, the process is a bit more complex. Specifically, the extraction must accomodate nybbles that cross frame boundaries. This is not possible with a single vec function (unless the frame width (BITS in vec) is set to a value at least as large as the length of the whole bitstring), so two 4-bit extractions are used.

The offset of the first 4-bit frame (\$c) is calculated and then vec is used to extract 4 bits at that and the following offset. The second nybble is left-shifted 4 to make it a high nybble, and the two are added to get a byte. Finally, since the desired nybble is located within that byte at an offset of 0..3 bits, the byte is shifted by that many bits to eliminate the extra bits.

In the code below, the loop iterator (\$_) is designated by \$offset and the compressed bitstring (\$_{9}) is represented by \$bitstr.

```\$c = \$offset - ( (\$offset - (\$offset % 4))/4 + (\$offset % 4 ? 1 : 0) )
+; # first frame

\$i = (
(vec( \$bitstr, \$c, 4 ) +        # low nybble
(vec( \$bitstr, \$c + 1, 4 )<<4)) # high nybble
>> 4 - (\$offset % 4) - 4 * ( (\$offset % 4) ? 0 : 1 ) # shift o
+ff extra bits
);

At this point \$i is a byte with the proper bits in the low nybble, but the low bit of the low nybble might need to be flipped (see the explanation of the compressed low nybble bitstring, above). This is tested and accomplished with the line:

```\$i += \$i%2 ? -1 : 1
if(
(  ((\$offset + 1) % 12) &&
!((\$offset + 1) % 3)     ) ||
\$offset + 1 == 22
);

Bit 4 (in the high nybble) is set by extracting the appropriate bit from the high bit string in \$_{65}, and bit 5 is turned on for all characters. These are added to the low nybble (from \$i) to give \$T.

```\$T = vec( pack( 'C', \$i ), 0, 4 ) +
vec( pack( 'b*', \$_{65} ), \$offset, 1 ) * (1<<4) +
(1<<5);

Bit 6 is set based on the value of \$T. If the character should be a space, \$T == 32 (only bit 5 is set) and no change is made, otherwise bit 6 is set. Finally, the 'j' and 'p' characters (the 0th and 13th characters in the JAPH) are capitalized by turning off bit 5, and the value is printed.

```print pack( 'c',
(\$T += \$T == (1<<5) ? 0 : (1<<6))
-= \$offset % 13 == 0 ? (1<<5) : 0 );

The last step in the main loop is to call the closure (see above).

The comma

The comma at the end of the JAPH is constructed in the closure, but before it is printed it must be XOR'd with 'X' (which I thought was interesting). The final value of the comma byte is retrieved from the closure by calling it with a parameter, and 'X' is stored as a bitstring in \$_{72}.

Update: corrected explanation of the bitwise negation in the closure (thanks jdalbec). My original version of this obfu used the value of \$O (the index) %2 to determine which term gets negated, and I forgot that I changed it to rely on the bitstring instead. I guess that's what I get for writing a post at 3am. :)

the two characters are negated in alternate calls to the closure.
For that to be true, \$_{64} would have to be '010101010101010101010101'. But I see that the output of the closure is more predictable than I had thought at first. The final result depends only on the last three (to some extent, six) characters of the string (in fact it's the same as the last character of the string).
```\$O = 19, \$C = ~\$C& 0x4D; # bits 7+ all 0
\$O = 20, \$C = ~\$C| 0x20; # bits 7+ all 1
\$O = 21, \$C = ~\$C^ 0x70; # bits 7+ all 0 bits 6543210
\$O = 22, \$C =  \$C&~0x6F; # bits 7+ all 0      00?0000
\$O = 23, \$C = ~\$C| 0x73; # bits 7+ all 1      1111111
\$O = 24, \$C =  \$C^~0x74; # bits 7+ all 0      1110100
I see also that I was bitten by contexts again. The correct list of values for &\$s in scalar context is
```# 0 4294967295 105  8 4294967263  73  8 4294967263  77  4 4294967291
+53
# 0 4294967295 116 20 4294967263 112 13 4294967282 125 16 4294967295 1
+16

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: obfuscated [id://474186]
Approved by marto
Front-paged by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2022-12-08 10:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?

No recent polls found

Notices?