Do you know where your variables are? PerlMonks

### Re^3: Challenge: Optimal Animals/Pangolins Strategy

by roboticus (Chancellor)
 on May 02, 2013 at 23:58 UTC ( #1031814=note: print w/replies, xml ) Need Help??

Here's kind of what I was thinking. The Huffman tree keeps the most common animals (fish, dog, cat) at the top of the tree with few questions, and the unusual critters (unicorn, pegasus, axolotl) at the bottom of the tree.

Is this something along the lines of what you're after?

```#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

# Read list of animals/occurrences into @T
my @T;
my \$ttl=0;
while (<DATA>) {
next if /^\s*\$/;
my (\$animal, \$occurrences) = split /\s+/, \$_;
\$ttl += \$occurrences;
my \$t = bless [ \$animal, \$occurrences ], "Animal";
insert( \$t );
}
print "Total occurrences: \$ttl\n";

# Turn it into a huffman tree
while (@T > 1) {
my \$L = shift @T;
my \$R = shift @T;
my \$t = bless [ [\$L, \$R], \$L->[1]+\$R->[1] ], "Question";
insert( \$t );
}
#print Dumper(\$T[0]);

my \$avg = compute_avg_num_guesses(\$T[0]) / \$ttl;
print "Avg: \$avg\n";

sub compute_avg_num_guesses {
my \$t = shift;
my \$question_depth = shift // 0;
if (ref \$t eq "Animal") {
my \$num_questions = \$t->[1] * \$question_depth;
print "Animal: \$t->[0], depth \$question_depth * occurrences \$t
+->[1] ==> \$num_questions\n";
return \$num_questions;
}
else {
my \$sum_questions = compute_avg_num_guesses(\$t->[0][0], \$quest
+ion_depth+1);
\$sum_questions += compute_avg_num_guesses(\$t->[0][1], \$questio
+n_depth+1);
return \$sum_questions;
}
}

sub insert {
# Inserting in order simplifies the conversion into a huffman tree
+...
my \$t = shift;
push @T, \$t;
my \$idx = \$#T;
while (\$idx) {
(\$T[\$idx],\$T[\$idx-1]) = (\$T[\$idx-1],\$T[\$idx]) if \$T[\$idx][1] <
+ \$T[\$idx-1][1];
--\$idx;
}
}

__DATA__
fish 150
horse 62
dog 85
cat 91
frog 28
axolotl 1
gerbil 3
hamster 5
ocelot 5
platypus 3
seal 18
walrus 15
wolf 55
wolverine 22
sheep 60
cow 75
unicorn 1
pegasus 1

Running it gives me:

```\$ perl huffman.pl
Total occurrences: 697
Animal: walrus, depth 5 * occurrences 15 ==> 75
Animal: badger, depth 5 * occurrences 17 ==> 85
Animal: seal, depth 5 * occurrences 18 ==> 90
Animal: pegasus, depth 8 * occurrences 1 ==> 8
Animal: axolotl, depth 9 * occurrences 1 ==> 9
Animal: unicorn, depth 9 * occurrences 1 ==> 9
Animal: hamster, depth 7 * occurrences 5 ==> 35
Animal: ocelot, depth 7 * occurrences 5 ==> 35
Animal: gerbil, depth 8 * occurrences 3 ==> 24
Animal: platypus, depth 8 * occurrences 3 ==> 24
Animal: cow, depth 3 * occurrences 75 ==> 225
Animal: fish, depth 2 * occurrences 150 ==> 300
Animal: dog, depth 3 * occurrences 85 ==> 255
Animal: cat, depth 3 * occurrences 91 ==> 273
Animal: wolverine, depth 5 * occurrences 22 ==> 110
Animal: frog, depth 5 * occurrences 28 ==> 140
Animal: wolf, depth 4 * occurrences 55 ==> 220
Animal: sheep, depth 4 * occurrences 60 ==> 240
Animal: horse, depth 4 * occurrences 62 ==> 248
Avg: 3.45050215208034

So if I haven't screwed up, it looks like given the stated distributions and guesses, that you'd need 3.45 guesses on average.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^4: Challenge: Optimal Animals/Pangolins Strategy
by Limbic~Region (Chancellor) on May 03, 2013 at 13:38 UTC
roboticus,
I don't think you screwed anything up - this is the standard 1 queue version of Huffman coding. It meets the stated goals of my objective.

Let me try and give some more insight into this fuzzy idea I have. Huffman coding gives ideal compression when the symbol set it is coding for and the frequencies those symbols appear is known up front. Typically, those symbols are single characters and for my purposes that will be true. If the bit sequence '01101' is used but '01100' is empty/unused, I see that as an opportunity to fill in the tree with some unknown symbols. Since all the single character symbols are known and populated, I would plug these "holes" with symbols representing character tuples.

I then realized that I should reduce the frequency of the single character symbols that were part of the newly inserted tuple because not all of them would be encoded. This could then generate a different looking tree with different holes to fill. Looks like an infinite recursion problem.

I then thought - if I can come up with a strategy to populate a tree given a desired outcome, I could just jump to the answer. In my sleep deprived state, I thought the Animals problem I posted got me there but it really just got me back to the beginning.

Cheers - L~R

Create A New User
Node Status?
node history
Node Type: note [id://1031814]
help
Chatterbox?
 [stevieb]: Nick... can you include a comma-separated list to the operation, or a wildcard?

How do I use this? | Other CB clients
Other Users?
As of 2017-07-22 19:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
I came, I saw, I ...

Results (340 votes). Check out past polls.