http://www.perlmonks.org?node_id=109803

Well, I dont know if this counts as a golf question, but considering BooK's obfus Huffman encoder i thought maybe people would find this interesting.

Ok to steal a useful diagram from BooKs explaination of Huffman encoding we have the following tree:

The Huffman algorithm pairs the sub-trees by weight. As long as there +are more than one sub-tree, pairs the lightest two into one sub-tree. + I chose to have the lightest branches on the left (bit 0). 49 For a file containing: 27 A /\ 15 B 22 A 3 C /\ 1 D 7 B 1 E /\ 1 F C 4 1 G /\ 2 2 the resulting tree is (with the weight of eac +h /\ /\ sub-tree noted at its root) drawn on the left +. D EF G
If we assume a left branch represents a 0 then for D we have 00100. What we want to do is to end up with a tree that has the same properties as a huffman tree, but with its branches organised so that D would be 00000.

More specifically we want to produce a tree so that a depth first left iteration over the leafs gives us the elements ordered from least frequent to most frequent.

This type of ordering would conver the huffmann tree into the following:

49 For alphabet and: 27 A /\ and weights as so: 15 B 22 A 3 C /\ 1 D 7 B 1 E /\ 1 F 4 C 1 G /\ 2 2 Ordered so that the most frequent /\ /\ have the least 0's in their path D EF G A=1 C=001 E=00001 G=00011 B=01 D=00000 F=00010
In this case the change is trivial, a swap of one node. But for larger alphabets and more evenly distributed weights it is not so.

The challenge then is to create a program that fixes a huffmann tree. The input should be a list of symbols weights and paths from a huffman encoder and the ouptut a similer list but adjusted as stated ealier.
Input can be matched with the regex /^\s*(\w+)\s+(\d+)\s+([01]+)\s*$/ where $1 is the symbol, $2 is the weight of the symbol and $3 is the path (in 1's and 0's) in the huffman tree.

a 27 1 b 15 01 c 3 000 d 1 00100 e 1 00101 f 1 00110 g 1 00111
Would produce
a 27 1 b 15 01 c 3 001 d 1 00000 e 1 00001 f 1 00010 g 1 00011

I have implemented a solution to this but I will hold out on posting it till I see some replies. Bonus points if you can figure out a way to build the tree properly *without* using the path information provided, merely the weights.

UPDATE
A little tip: Sometimes you miss the forest for the trees.

Yves
--
You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)

Replies are listed 'Best First'.
Re (tilly) 1: A Golf question maybe? Huffman with a twist
by tilly (Archbishop) on Sep 04, 2001 at 20:30 UTC
    I don't see how your specification is solvable.

    What output would you want on an input file of:

    10 a 10 b 11 c 11 d 18 e 19 f
    A valid Huffman output has to look something like:
    a 10 000 b 10 001 e 18 01 c 11 100 d 11 101 f 19 11
    There is simply no way to write a valid Huffman encoding of this example with the letters appearing sorted by weight. (In this representation e appears before c and d.)
      Hmm. Well, im *pretty* sure that this is solvable :-), either that or the solution I came up with is bogus, and I tested it thoroughly.

      I think the following does the trick....

      b 10 000 a 10 001 d 11 010 c 11 011 e 18 10 f 19 11
      Displayed in depthfirst left format.
      So i think youll have to revisit the problem, or should i take a bit of time to clarify it? Perhaps I was not sufficiently clear?? /msg me and let me know tilly.

      Cheers,
      Yves
      --
      You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)

        Oops, I got my weights wrong. Try 10, 10, 12, 12, 21, 22.

        The spec you give for a Huffman encoder means that you have to generate the tree by combining the lightest two subtrees at each step. Therefore your structure must evolve as follows (using parens and weights because my ascii art sucks):

        (a:10 b:10 c:12 d:12 e:21 f:22) ((a b):20 c:12 d:12 e:21 f:22) ((a b):20 (c d):24 e:21 f:22) (((a b) e):41 (c d):24 f:22) (((a b) e):41 ((c d) f):46)
        That grouping is forced from your description of the Huffman spec, and conflicts with arranging the elements in order of ascending weight.