Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:
I have this really weird idea in my head that refuses to fully form so I can't explain it. As I was falling asleep last night, I realized that I could ask my question without explaining my idea by relating it to something I do understand.
Background
Two months after I started learning perl, I wrote AI Animals. The idea is simple. You have a tree where nodes can either be yes/no questions or guesses. If a guess is determined to be incorrect, the node is replaced with a yes/no question that tells the difference between the guessed animal and the correct animal. If you need more background/explanation, please go visit that node.
Challenge
You have been invited to particpate in a contest to devise the optimal starting tree for the game. Each participant is given a file that contains a million previous played games from an unknown number of different starting trees. You have 1 week to analyze the data and build your initial tree. The day of the contest 50,000 previous games will be randomly selected and the winner will be the one that asks the fewest number of questions to correctly guess all the animals.
You immediately recognize a few flaws that you can exploit to guarantee yourself the win. First, no unknown animals will be introduced. Second, while the file contains a million previous games, there is only a small number of unique animals ever chosen (let's say 100 for our purposes). Finally, you realize that you can tell the frequency that each animal will be chosen by how frequently it was previously chosen. All you need to do is structure the tree in a way that the number of questions to guess a given animal is inversly proportional to how frequently it has been chosen. Once you know where the animals must appear in the tree, you can come up with the questions by hand.
Setting aside the "questions" portion since my problem really has nothing to do with the game: How can I figure out the optimal tree assuming I have a small fixed number of items to identify and the frequency that I will need to identify them?
Cheers - L~R
|
---|