Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:
All,
I have this really weird idea in my head that refuses to fully form so I can't explain it. As I was falling asleep last night, I realized that I could ask my question without explaining my idea by relating it to something I do understand.
Background
Two months after I started learning perl, I wrote AI Animals. The idea is simple. You have a tree where nodes can either be yes/no questions or guesses. If a guess is determined to be incorrect, the node is replaced with a yes/no question that tells the difference between the guessed animal and the correct animal. If you need more background/explanation, please go visit that node.
Challenge
You have been invited to particpate in a contest to devise the optimal starting tree for the game. Each participant is given a file that contains a million previous played games from an unknown number of different starting trees. You have 1 week to analyze the data and build your initial tree. The day of the contest 50,000 previous games will be randomly selected and the winner will be the one that asks the fewest number of questions to correctly guess all the animals.
You immediately recognize a few flaws that you can exploit to guarantee yourself the win. First, no unknown animals will be introduced. Second, while the file contains a million previous games, there is only a small number of unique animals ever chosen (let's say 100 for our purposes). Finally, you realize that you can tell the frequency that each animal will be chosen by how frequently it was previously chosen. All you need to do is structure the tree in a way that the number of questions to guess a given animal is inversly proportional to how frequently it has been chosen. Once you know where the animals must appear in the tree, you can come up with the questions by hand.
Setting aside the "questions" portion since my problem really has nothing to do with the game: How can I figure out the optimal tree assuming I have a small fixed number of items to identify and the frequency that I will need to identify them?
Re: Challenge: Optimal Animals/Pangolins Strategy
by roboticus (Chancellor) on May 02, 2013 at 16:03 UTC
|
| [reply] |
|
| [reply] |
|
Limbic~Region:
Here's kind of what I was thinking. The Huffman tree keeps the most common animals (fish, dog, cat) at the top of the tree with few questions, and the unusual critters (unicorn, pegasus, axolotl) at the bottom of the tree.
Is this something along the lines of what you're after?
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
# Read list of animals/occurrences into @T
my @T;
my $ttl=0;
while (<DATA>) {
next if /^\s*$/;
my ($animal, $occurrences) = split /\s+/, $_;
$ttl += $occurrences;
my $t = bless [ $animal, $occurrences ], "Animal";
insert( $t );
}
print "Total occurrences: $ttl\n";
# Turn it into a huffman tree
while (@T > 1) {
my $L = shift @T;
my $R = shift @T;
my $t = bless [ [$L, $R], $L->[1]+$R->[1] ], "Question";
insert( $t );
}
#print Dumper($T[0]);
my $avg = compute_avg_num_guesses($T[0]) / $ttl;
print "Avg: $avg\n";
sub compute_avg_num_guesses {
my $t = shift;
my $question_depth = shift // 0;
if (ref $t eq "Animal") {
my $num_questions = $t->[1] * $question_depth;
print "Animal: $t->[0], depth $question_depth * occurrences $t
+->[1] ==> $num_questions\n";
return $num_questions;
}
else {
my $sum_questions = compute_avg_num_guesses($t->[0][0], $quest
+ion_depth+1);
$sum_questions += compute_avg_num_guesses($t->[0][1], $questio
+n_depth+1);
return $sum_questions;
}
}
sub insert {
# Inserting in order simplifies the conversion into a huffman tree
+...
my $t = shift;
push @T, $t;
my $idx = $#T;
while ($idx) {
($T[$idx],$T[$idx-1]) = ($T[$idx-1],$T[$idx]) if $T[$idx][1] <
+ $T[$idx-1][1];
--$idx;
}
}
__DATA__
fish 150
horse 62
dog 85
cat 91
frog 28
axolotl 1
gerbil 3
hamster 5
ocelot 5
platypus 3
seal 18
badger 17
walrus 15
wolf 55
wolverine 22
sheep 60
cow 75
unicorn 1
pegasus 1
Running it gives me:
$ perl huffman.pl
Total occurrences: 697
Animal: walrus, depth 5 * occurrences 15 ==> 75
Animal: badger, depth 5 * occurrences 17 ==> 85
Animal: seal, depth 5 * occurrences 18 ==> 90
Animal: pegasus, depth 8 * occurrences 1 ==> 8
Animal: axolotl, depth 9 * occurrences 1 ==> 9
Animal: unicorn, depth 9 * occurrences 1 ==> 9
Animal: hamster, depth 7 * occurrences 5 ==> 35
Animal: ocelot, depth 7 * occurrences 5 ==> 35
Animal: gerbil, depth 8 * occurrences 3 ==> 24
Animal: platypus, depth 8 * occurrences 3 ==> 24
Animal: cow, depth 3 * occurrences 75 ==> 225
Animal: fish, depth 2 * occurrences 150 ==> 300
Animal: dog, depth 3 * occurrences 85 ==> 255
Animal: cat, depth 3 * occurrences 91 ==> 273
Animal: wolverine, depth 5 * occurrences 22 ==> 110
Animal: frog, depth 5 * occurrences 28 ==> 140
Animal: wolf, depth 4 * occurrences 55 ==> 220
Animal: sheep, depth 4 * occurrences 60 ==> 240
Animal: horse, depth 4 * occurrences 62 ==> 248
Avg: 3.45050215208034
So if I haven't screwed up, it looks like given the stated distributions and guesses, that you'd need 3.45 guesses on average.
...roboticus
When your only tool is a hammer, all problems look like your thumb. | [reply] [d/l] [select] |
|
Re: Challenge: Optimal Animals/Pangolins Strategy
by BrowserUk (Patriarch) on May 02, 2013 at 16:34 UTC
|
#! perl -slw
use strict;
use Data::Dump qw[ pp ];
use List::Util qw[ reduce ];
our $N //= 1;
my %byFreq = map{
int( rand 1000 ) => $_
} 'A'x$N .. 'Z'x$N;
pp \%byFreq;
my @tree;
reduce{
$a->[0] = $byFreq{ $b };
$a->[1] = [];
} \@tree, sort{ $a <=> $b } keys %byFreq;
pp \@tree;
Output: __END__
C:\test>1031775.pl
{
14 => "E",
49 => "I",
75 => "N",
81 => "R",
128 => "W",
178 => "Q",
180 => "X",
189 => "V",
199 => "S",
354 => "O",
359 => "J",
403 => "G",
485 => "T",
492 => "Z",
554 => "B",
603 => "M",
604 => "K",
608 => "Y",
632 => "C",
749 => "D",
779 => "P",
795 => "L",
846 => "U",
864 => "F",
921 => "A",
948 => "H",
}
[
"E",
[
"I",
[
"N",
[
"R",
[
"W",
[
"Q",
[
"X",
[
"V",
[
"S",
[
"O",
[
"J",
[
"G",
[
"T",
[
"Z",
[
"B",
[
"M",
[
"K",
[
"Y",
["C", ["D", ["P", ["L", ["U", ["F"
+, ["A", ["H", []]]]]]]]],
],
],
],
],
],
],
],
],
],
],
],
],
],
],
],
],
],
]
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
| [reply] |
|
Well, you mentioned proportional which I interpreted to mean that higher frequencies should take longer to reach than lower frequencies, which barring the possibility of equal frequencies, a lop-sided tree achieves.
(Albeit you said inversely proportional which would mean reversing the order of the sort from what I posted.)
The only other sense I can get from the information provided -- brought on by the mention of Huffman -- is that you are perhaps looking to minimise the depth of the tree. This does that by building a heap and then converting it to a tree rather clumbsily. Though that could be fixed if the idea is right:
#! perl -slw
use strict;
use Data::Dump qw[ pp ];
use List::Util qw[ reduce ];
use enum qw[ NAME FREQ LEFT RIGHT ];
our $N //= 1;
my$n = 0;
my @heap = map{
$_->[LEFT] = ++$n;
$_->[RIGHT] = ++$n;
$_;
} sort {
$a->[FREQ] <=> $b->[FREQ]
} map[
$_ , int( rand 1000 )
], 'A'x$N .. 'Z'x$N;;
my @tree = map {
$_->[LEFT] = $heap[ $_->[LEFT] ], $_->[RIGHT] = $heap[ $_->[RIGHT] ]
} @heap;
pp \@tree;
Output: C:\test>1031775.pl
do {
my $a = [
[
"Y",
1,
[
"G",
4,
[
"D",
166,
["X", 245, ["I", 516, undef, undef], ["O", 563, undef, undef
+]],
["K", 315, ["M", 628, undef, undef], ["R", 710, undef, undef
+]],
],
[
"A",
218,
["P", 324, ["T", 731, undef, undef], ["Q", 732, undef, undef
+]],
["J", 374, ["V", 735, undef, undef], ["C", 835, undef, undef
+]],
],
],
[
"E",
33,
[
"U",
220,
["L", 393, ["S", 845, undef, undef], ["F", 930, undef, undef
+]],
["W", 471, ["Z", 944, undef, undef], undef],
],
["H", 228, ["B", 507, undef, undef], ["N", 515, undef, undef]]
+,
],
],
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
'fix',
];
$a->[1] = $a->[0][2];
$a->[2] = $a->[0][3];
$a->[3] = $a->[0][2][2];
$a->[4] = $a->[0][2][3];
$a->[5] = $a->[0][3][2];
$a->[6] = $a->[0][3][3];
$a->[7] = $a->[0][2][2][2];
$a->[8] = $a->[0][2][2][3];
$a->[9] = $a->[0][2][3][2];
$a->[10] = $a->[0][2][3][3];
$a->[11] = $a->[0][3][2][2];
$a->[12] = $a->[0][3][2][3];
$a->[13] = $a->[0][3][3][2];
$a->[14] = $a->[0][3][3][3];
$a->[15] = $a->[0][2][2][2][2];
$a->[16] = $a->[0][2][2][2][3];
$a->[17] = $a->[0][2][2][3][2];
$a->[18] = $a->[0][2][2][3][3];
$a->[19] = $a->[0][2][3][2][2];
$a->[20] = $a->[0][2][3][2][3];
$a->[21] = $a->[0][2][3][3][2];
$a->[22] = $a->[0][2][3][3][3];
$a->[23] = $a->[0][3][2][2][2];
$a->[24] = $a->[0][3][2][2][3];
$a->[25] = $a->[0][3][2][3][2];
$a;
}
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
/div | [reply] [d/l] [select] |
|
|
|
|
| [reply] |
|
Re: Challenge: Optimal Animals/Pangolins Strategy
by QM (Parson) on May 02, 2013 at 18:04 UTC
|
I think the previous commenters overlooked the questions!
While the previous games will certainly have named animals (and thus frequencies), where do the yes/no questions get involved?
All of the previous games would also record the questions. It is possible that the questions are not optimal (or even relevant), so a better approach might be a list of attributes for each possible animal, independent of its history in previous games.
For practical purposes, besides yes/no, these games usually allow "doesn't matter" and "I don't know".
-QM
--
Quantum Mechanics: The dreams stuff is made of
| [reply] |
|
| [reply] |
|
| [reply] [d/l] |
|
|
Yep, my bad.
-QM
--
Quantum Mechanics: The dreams stuff is made of
| [reply] |
|
| [reply] |
|
I didn't have a solution in mind. There's the simplistic AI that asks you for a distinguishing question and records the answer every time it guesses wrong. To get beyond that you would need a matrix of questions, and answers for each animal; or otherwise generate a semantic database and minimize the categories and question tree. I think it ends up being something like generating a minimal cover set (but I'm fuzzy on this).
-QM
--
Quantum Mechanics: The dreams stuff is made of
| [reply] |
|
Re: Challenge: Optimal Animals/Pangolins Strategy
by Anonymous Monk on May 03, 2013 at 12:40 UTC
|
This is the Huffman problem, but of a taxonomy tree. "Simple Huffman" will tell you what is the shortest possible tree, but taxonomy rules demand that Orangutans and Goldfish can't be in the same branch. The problem is to develop a taxonomy which will produce the shortest possible Huffman tree that takes those taxonomy rules into account. The exact nature of the allowed questions – a yes-or-no question is binary, others are not – is also essential to know. | [reply] |
|
|