Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
Perl Monk, Perl Meditation

Huffman encoder

by BooK (Curate)
on Feb 20, 2001 at 21:37 UTC ( #59718=obfuscated: print w/ replies, xml ) Need Help??

Falkkin is working on a Java Huffman encoder...

Moved only by the will to serve, I propose him an example of what he could do if he did that in Perl... :-)

$"=$/=$_;map{$0{$_}++}split//,$_=<>;@0=map{[$_,$0{$_},$_]}keys%0; $/=1+1;until(@0==1){@0=sort{$$b[1]<=>$$a[1]}@0;$t=pop@0;$p=pop@0; push@0,[[$$t[0],$$p[0]],$$t[1]+$$p[1],$$t[$/].$$p[$/]];map{map{$t {$_}=$;%$/.$t{$_}}split//;$;++}$$t[$/],$$p[$/]}s/(.)/$t{$1}/gs;sub _{my$t=pop;if(ref$t){my@t=map{if(!ref$$t[$_]){1,unpack"b8",$$t[$_]} else{0,""}}0,1;"@t[0,$/,1]"._($$t[0])._($$t[1])."$t[-1]"}}print pack"sB*",length,_($0[0][0]).$_

(This was one of my "helpers" for OPC5)
Corresponding Huffman decoder to be posted soon.

Comment on Huffman encoder
Download Code
Huffman encoder SOLUTION
by BooK (Curate) on Feb 20, 2001 at 21:48 UTC

    Cet assombrissement est soumis au nom de canal assombri.

    This program perform a Huffman compression to its STDIN or to the files given on the command-line. One drawback is that I was too lazy to write the code to output the compressed data of different files to different output files...


    First $/ and $" are undefined, so that the first use of the diamond operator slurps the whole file, and that the double-quoting of array(slice)s don't result in unwanted spaces. map is used instead of for because I like returning lists in a void context. Furthermore, it saves a few ()'s.

    Stéphane Payrard (of OPC3 fame) taught me that @0 is a perfectly valid array. That's a good enough reason to use it. The same goes for %0.

    Thanks to O::Deparse, I learned a few tricks like writing $a->[1] as $$a[1]... (Take this as a proof that one can learn something through obfuscation.) Naturally, I doubt that O::Deparse helped you much, since here it spits out:

    Can't call method "sibling" on an undefined value at /usr/lib/perl5/5.6.0/i386-linux/B/ line 257.
    CHECK failed--call queue aborted.

    I must admit I don't know at all how it is possible. But, hey that's one more hurdle on your way to understanding this program.

    So, %0 holds the count for each character (the character is the key, the count is the value). Given the character, we create a array containing one array by character, containg: one sub-tree of the Huffman tree (at first, it's only a leaf (i.e. the character itself)), its weight, and a string containing all the sub-tree leaves.


    The Huffman algorithm pairs the sub-trees by weight. As long as there are more than one sub-tree, pairs the lightest two into one sub-tree. I chose to have the lightest branches on the left (bit 0).

                49           For a file containing: 27 A
                /\                                  15 B
              22  A                                  3 C
              /\                                     1 D
             7  B                                    1 E
            /\                                       1 F
           C  4                                      1 G
            2  2             the resulting tree is (with the weight of each
           /\  /\            sub-tree noted at its root) drawn on the left.
          D  EF  G

    This tree is encoded at the beginning of the file in the following manner:

                01           Two bits are used by node (in the previous example,
                /\           there are 6 nodes and 7 leaves), one for each branch.
              01  A
              /\             0 means non-terminal node and 1 means terminal node
            10  B            (i.e. there is a leaf hanging there).
           C 00              Our tree looks like the one on the left.
           11  11            The recursive procedure that builds the tree
           /\  /\            represents the data in the following manner:
          D  EF  G           [left-leaf][sub-tree][right-leaf]

    Applying this procedure (named _) recursively, we obtain: 010110C0011DE11FGBA (every character being replaced with its bit representation).


    While we build the tree (line 2 and 3), we also create a HASH %t which keys are the characters in the file to be compressed and which values are the bit string (the path in the tree). This path is constructed as we creare the tree itself. $; and $/ are used to create the needed 0's and 1's... $/ equals 2 (whose modulus helps), and $; is first undefined and is an even number each time we need it again in our map.

    Finally, we perform a substitution on the data, replacing each character by the bit string that represents the path in the tree. The compressed file is the concatenation of the length of the data (since pack will pad the file with null bits, we need this information for the decompressor), the dictionary (which is self-explaining: no need for its size), and the data itself.


    This program was designed to have as little distinct characters as possible, and also so as some character appear really a lot. It was too difficult for me to write it so that the respective probability of each character would be a negative power of 2 (1/2**$k if you like), which is when Huffman encoding performs at its best. My goal was to write a compressor / decompressor all in one obfuscation. Alas, Huffman doesn't perform well enough (it's that lousy dictionary's fault!). OK, next time I'll try with Lempel-Ziv...


    crazyinsomniac asked me to add this link to a site with explanations and a Java animation.

Re: Huffman encoder
by japhy (Canon) on Feb 20, 2001 at 22:29 UTC
    You can chop characters out like mad if you embrace the map EXPR, LIST syntax, instead of map BLOCK LIST. See, then:
    map{$0{$_}++}split//,$_=<>; @0=map{[$_,$0{$_},$_]}keys%0;
    become two PRECIOUS characters shorter:
    map$0{$_}++,split//,$_=<>; @0=map[$_,$0{$_},$_],keys%0;

    japhy -- Perl and Regex Hacker
Re: Huffman encoder
by kh (Initiate) on Mar 03, 2001 at 19:30 UTC
    congrats BooK; although i have not tested the code i presume it is fully functional, and whatever it is quite a tour-de-force of minimalistic perl programming... was wondering if you or any others know of other compression miniatures/ other attempts at compression in perl. lempel ziv would be particularly interessant... oh, and pure perl would be a *big* plus...

      Well, uh, I also did a Lempel-Ziv encoder/decoder pair, but I discovered that it only worked with some processors only (it is dependent on the big endian/little endian stuff). Lempel-Ziv and its derivatives are not that difficult to code. Once you get the principle (which is explained in many web sites), it's quite easy.

      And please, don't assume code is functionnal: test it, read it, try to understand how it works. Don't just believe what those evil obfuscators say. :-)

New Huffman Obfus
by demerphq (Chancellor) on Sep 16, 2001 at 05:11 UTC
    Inspired by BooK's quite excellent obfuscated Huffman Encoder (434 chars on a w32 box), and many hours of my time playing with the algorithm I decided I would make an attempt at my own, and try to shave a characters off of the solution. After quite a bit of effort (this obfus stuff is hard.. ++ to those of you who make it look so easy) getting bitten by binmode, discovering which global variables could be used in exotic ways I wish to present my solutions: (All of the solutions expect input and ouput over STDIN and STDOUT)

    My first solution is the equivelent to BooK's original encoder, ie it only encodes the message, without a header containing the information necessary to decode the message. It is 316 characters long (on a w32 box).

    map{binmode$_}\*STDIN,\*STDOUT;sub r{my($n,$s,$c)=@_;$c=$$n[0];return$ +1{$c}=$s if!ref($c);map r($$c[$_],$s.$_),0,1;}sub h{$/=\1;$0{$_}++,$,.=$_ while +<>;@_=map [$_,$0{$_}],keys%0;while(@_>1){@_=sort{$$b[1]<=>$$a[1]}@_;$l=pop;$r=po +p;push@_ ,[[$l,$r],$$l[1]+$$r[1]];}r pop;$,=~s/./$1{$&}/gs;pack"SB*",$.,$,;}pri +nt h;
    My next solution is the same thing but this time with the header information. I call it, it is 337 bytes long.
    $"=$_;map{binmode$_}\*STDIN,\*STDOUT;sub r{my($n,$s,$t,$c)=@_;$c=$$n[0 +];return $,++,$_{$c}=$s,push@0,pack"aCB16",$c,$t,$s if!ref($c);map r($$c[$_],$s +.$_,$t+1 ),0,1}sub h{$/=\1;$0{$_}++,$;.=$_ while<>;@_=map[$_,$0{$_}],keys%0;whi +le(@_>1) {@_=sort{$$b[1]<=>$$a[1]}@_;$l=pop;$r=pop;push@_,[[$l,$r],$$l[1]+$$r[1 +]]}r pop ;$;=~s/./$_{$&}/gs;chr($,)."@0".pack("SB*",$.,$;)}print h
    The next is the corresponding decoder, as might be guessed I call it, and it is 259 bytes long. The output of the encoder can be piped into this script and the result will be the original file. such as by
    perl < | perl
    (usage on *nix machines might be little more graceful..)
    $/=$_;map{binmode$_}\*STDIN,\*STDOUT;while($"=<>){map{$/=substr$",0,4, +'';($,, $;)=unpack"aC",$/;$_{(unpack"aCB$;",$/)[2]}=$,}1..ord substr$",0,1,''; +($,,$") =unpack"SB*",$";map{$;=1;$;++ until exists$_{substr$",0,$;};$\.=$_{sub +str$",0 ,$;,""}}1..$,;print""}
    And for my last trick, I humbly present the combination of the two. This version I call and is 622 bytes long. Like the others it uses STDIN and STDOUT but also takes an argument to determine if it decodes or encodes (no arguments at all).
    perl < | perl -d
    $"=$/=$_;map{binmode$_}\*STDIN,\*STDOUT;sub r{my($n,$s,$t,$c)=@_;$c=$$ +n[0]; return$,++,$_{$c}=$s,push@0,pack"aCB16",$c,$t,$s if!ref($c);map r($$c[ +$_], $s.$_,$t+1),0,1}sub h{$/=\1;$0{$_}++,$;.=$_ while<>;@_=map[$_,$0{$_}], +keys %0;while(@_>1){@_=sort{$$b[1]<=>$$a[1]}@_;$l=pop;$r=pop;push@_,[[$l,$r +], $$l[1]+$$r[1]]}r pop;$;=~s/./$_{$&}/gs;chr($,)."@0".pack("SB*",$.,$;)} +if( pop@ARGV){while($"=<>){map{$/=substr$",0,4,'';($,,$;)=unpack"aC",$/;$_ +{( unpack"aCB$;",$/)[2]}=$,}1..ord substr$",0,1,'';($,,$")=unpack"SB*",$" +;map{ $;=1;$;++ until exists$_{substr$",0,$;};$\.=$_{substr$",0,$;,""}}1..$, +;print ""}}else{print h}
    Many thanks to the Obfu crew and especially BooK for the inspiration and the walk through of his implementation, especially in regard to the use of funky global variables.

    BTW, the code is set up for 80 columns, which means if you dont want it split you need to see your wrap code length to be 81 (go figure) in the User Settings to see all of them properly.

    Oh and apologies to any who think this is a waste of time as its already been done, for me it was a _great_ exercise (I think that there are imprtant lessons to be learned by doing an obfu), and I'm damn proud of it. (never even looked at BooK's Huffman Decoder!)


    You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)
    UPDATE:Minor text changes

      I want to implement the perl version for huffman encoder from, thanks a lot and i hope it works with problem

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: obfuscated [id://59718]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (10)
As of 2014-04-17 07:21 GMT
Find Nodes?
    Voting Booth?

    April first is:

    Results (440 votes), past polls