Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
demerphq,
Nice speaking with you again as well. I completley understand how confusing this must be to everyone else since I am having a hard time keeping it straight myself. I have multiple people on my team and they are each coming up with different ideas and I have my own. They are overlapping.

The reference to the Huffman Coding and "plugging holes" has nothing to do with this post - it deals with Challenge: Optimal Animals/Pangolins Strategy. You are absolutely correct about there not being any holes. The holes come in when you don't "properly construct" the tree. Somewhere in one of these threads, I mentioned that he is populating the tree top down instead of bottom up. I read somewhere that Huffman's professors had the same problem - it isn't perfect. I am sure this has confused some people because I have continued to refer to holes and Huffman Coding together rather than saying Huffman Coding-ish. I originally didn't mention Huffman Coding at all because that wasn't the road I was trying to go down - it just seemed like they were equivalent solutions - they are not.

The reason this thread doesn't look like it has anything to do with Huffman coding is because it isn't. This guy's idea was to map all 96 symbols to 7 bits and use the 32 remaining 7 bit sequences to encode some popular multi-character tuples.

You asked in reference to THEN and HE:

It is not clear why this is so. Is this because you aren't allowed overlapping sequences? Or some other reason. Can you explain more?

It is because HE is a complete substring of THEN. Once HE no longer exists anywhere in the file, there will no longer be any THEN to replace. Imagine the following dictionary:

0000000 = T 0000001 = H 0000010 = E 0000011 = N 0000100 = HE 0000101 = THEN 0000110 = <space>
Now imagine the file we are compressing contains
THE HEN THEN
We note that HE appears 3 times and will give us the largest compression (I have used # to indicate the data has already been compressed).
T# #N T#N

There is no longer a THEN to consider.

Anyway, it sounds like this problem is NP hard

It certainly does which is why I think a heuristic solution would work. I just need to think about BrowserUk's concerns.

Cheers - L~R


In reply to Re^2: Dynamically Updating Frequency Analysis by Limbic~Region
in thread Dynamically Updating Frequency Analysis by Limbic~Region

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others having an uproarious good time at the Monastery: (19)
    As of 2014-07-23 19:10 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My favorite superfluous repetitious redundant duplicative phrase is:









      Results (151 votes), past polls