http://www.perlmonks.org?node_id=1031811


in reply to Re^2: Challenge: Optimal Animals/Pangolins Strategy
in thread Challenge: Optimal Animals/Pangolins Strategy

Well, you mentioned proportional which I interpreted to mean that higher frequencies should take longer to reach than lower frequencies, which barring the possibility of equal frequencies, a lop-sided tree achieves.

(Albeit you said inversely proportional which would mean reversing the order of the sort from what I posted.)

The only other sense I can get from the information provided -- brought on by the mention of Huffman -- is that you are perhaps looking to minimise the depth of the tree. This does that by building a heap and then converting it to a tree rather clumbsily. Though that could be fixed if the idea is right:

#! perl -slw use strict; use Data::Dump qw[ pp ]; use List::Util qw[ reduce ]; use enum qw[ NAME FREQ LEFT RIGHT ]; our $N //= 1; my$n = 0; my @heap = map{ $_->[LEFT] = ++$n; $_->[RIGHT] = ++$n; $_; } sort { $a->[FREQ] <=> $b->[FREQ] } map[ $_ , int( rand 1000 ) ], 'A'x$N .. 'Z'x$N;; my @tree = map { $_->[LEFT] = $heap[ $_->[LEFT] ], $_->[RIGHT] = $heap[ $_->[RIGHT] ] } @heap; pp \@tree;

Output:

C:\test>1031775.pl do { my $a = [ [ "Y", 1, [ "G", 4, [ "D", 166, ["X", 245, ["I", 516, undef, undef], ["O", 563, undef, undef +]], ["K", 315, ["M", 628, undef, undef], ["R", 710, undef, undef +]], ], [ "A", 218, ["P", 324, ["T", 731, undef, undef], ["Q", 732, undef, undef +]], ["J", 374, ["V", 735, undef, undef], ["C", 835, undef, undef +]], ], ], [ "E", 33, [ "U", 220, ["L", 393, ["S", 845, undef, undef], ["F", 930, undef, undef +]], ["W", 471, ["Z", 944, undef, undef], undef], ], ["H", 228, ["B", 507, undef, undef], ["N", 515, undef, undef]] +, ], ], 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', 'fix', ]; $a->[1] = $a->[0][2]; $a->[2] = $a->[0][3]; $a->[3] = $a->[0][2][2]; $a->[4] = $a->[0][2][3]; $a->[5] = $a->[0][3][2]; $a->[6] = $a->[0][3][3]; $a->[7] = $a->[0][2][2][2]; $a->[8] = $a->[0][2][2][3]; $a->[9] = $a->[0][2][3][2]; $a->[10] = $a->[0][2][3][3]; $a->[11] = $a->[0][3][2][2]; $a->[12] = $a->[0][3][2][3]; $a->[13] = $a->[0][3][3][2]; $a->[14] = $a->[0][3][3][3]; $a->[15] = $a->[0][2][2][2][2]; $a->[16] = $a->[0][2][2][2][3]; $a->[17] = $a->[0][2][2][3][2]; $a->[18] = $a->[0][2][2][3][3]; $a->[19] = $a->[0][2][3][2][2]; $a->[20] = $a->[0][2][3][2][3]; $a->[21] = $a->[0][2][3][3][2]; $a->[22] = $a->[0][2][3][3][3]; $a->[23] = $a->[0][3][2][2][2]; $a->[24] = $a->[0][3][2][2][3]; $a->[25] = $a->[0][3][2][3][2]; $a; }

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
/div

Replies are listed 'Best First'.
Re^4: Challenge: Optimal Animals/Pangolins Strategy
by Limbic~Region (Chancellor) on May 03, 2013 at 13:45 UTC
    BrowserUk,
    By inversely proportional, I meant that the number of questions asked to identify an animal (Q) multiplied by how many times the animal is chosen (C) should be constant. If a goat appears a hundred times more often than a unicorn then it should take a hundred times more questions to identify the unicorn than the goat.

    I apologize for not seeing it before hand, but I am pretty sure the optimal strategy for my problem is in fact Huffman Coding. To get a better idea of the fuzzy problem I am dealing with in my head, see Re^4: Challenge: Optimal Animals/Pangolins Strategy

    Cheers - L~R

      By inversely proportional, I meant that the number of questions asked to identify an animal (Q) multiplied by how many times the animal is chosen (C) should be constant. If a goat appears a hundred times more often than a unicorn then it should take a hundred times more questions to identify the unicorn than the goat.

      In roboticus' example, which you seem to be endorsing, this is the tree produced:

      Q ___________________/ \___________________ / \ Q __________________Q / \ / \ Q fish Q _________Q / \ / \ / +\ __________Q cow dog cat Q + Q / \ / \ +/ \ Q Q Q wolf sheep + horse / \ / \ / \ walrus badger seal Q________________ wolverine frog / \ Q Q / \ / \ Q hampster ocolot Q / \ / \ pegasus Q gerbil platypus / \ axolotl unicorn

      The fish with a frequency of 150 requires 2 questions; the unicorn with a frequency of 1, required 9. And it puts walrus(15), badger(17), seal(18), wolverine(22) & frog(28) at the same level.

      So the inverse proportionality is relative rather than mathematically absolute. It would require the insertion of 291 additional questions above the unicorn to achieve the math you describe, and in the process, throws away the "compressive" attribute that defines Huffman.

      If non-compressive, relative inverse proportionality is sufficient, then my original reading of your question would be more accurate:

      Q / \ fish Q / \ cat Q / \ dog Q / \ cow Q / \ horse Q / \ sheep Q / \ wolf Q / \ frog Q / \ wolverine Q / \ seal Q / \ badger Q / \ walrus Q / \ ocelot Q / \ hamster Q / \ gerbil Q / \ platypus Q / \ unicorn Q / \ pegasus axolotl

      Which brings me back to the idea that what roboticus' use of Huffman does, is minimise the depth of the tree.

      But if that were the goal, then its maximum depth of 9 is 3 more than is required:

      __________Q__________ / \ fish ____________________Q______________________ / \ Q _________________________Q_________________ +__________ / \ / + \ cat dog _________Q__________ __ +___________Q_______________ / \ / + \ ____Q____ _____Q______ _____Q___ +___ ______Q______ / \ / \ / + \ / \ Q Q Q Q Q + Q Q Q / \ / \ / \ / \ / \ + / \ / \ / \ cow horse sheep wolf frog wolverine seal badger walrus ocelot ham +ster gerbil platypus unicorn pegasus axolotl

      All of which I guess means, that I have no idea what you set out to achieve :(


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        BrowserUk,
        I feel like this should be an episode of Doctor Who with incongruent time lines.

        At the time I responded to the clarification of what I meant by inversely proportional, I had already moved on from thinking it was an appropriate solution. I only clarified for the sake of completeness. What I should have said was something along the lines of:

        Not that it matters since I now realize it does not solve my problem but there is a difference between having an inverse relationship and being inversly proportional. The idea that the more popular an animal is the fewer questions it should take to identify is an inverse relationship. When I said inversly proportional I meant that the product of popularity to questions should be a constant defining exactly how many questions should be asked. In the end, I was wrong.

        As for what I am trying to achieve - I am attempting to build on top of Huffman coding. Let's say you have a file that you have done single byte frequency analysis on and generated a Huffman code tree. You notice that a few of the branches only have 1 leaf entry instead of 2. You decide you want to fill in those "holes" with with the highest frequency 2 byte pairs in the file. You fill in the first hole but before moving on to the next one, you realize a problem. The frequency analysis of the single bytes requires recalculating which means rebuilding the tree which means different holes.

        I am not sure if that makes any more sense. I took the weekend off from thinking about it in hopes that I would have clarity today but it is still a jumbled pile of mud in my mind.

        Cheers - L~R