Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: Data compression by 50% + : is it possible?

by LanX (Archbishop)
on May 12, 2019 at 10:10 UTC ( #1233638=note: print w/replies, xml ) Need Help??


in reply to Re: Data compression by 50% + : is it possible?
in thread Data compression by 50% + : is it possible?

Hello Roboticus,

I just realized that we had the same ideas (me just later ;).

One difference is that you want to encode each of the 9 groups individually with 6 bits each => 6*9=54 bits per line while I'm encoding a whole line as a polynom

Sum($g($i) * 50**$i) with $i=0 .. 8

resulting in the need of 51 bits per line.

But I don't understand why you say

>  That's not quite enough to crunch out half the space

The old encoding needs per line in average >14 bytes plus newline.

That's > 15 * 8 = 120 bits

So your 54 bits need already less than half the space!

What am I missing?

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Replies are listed 'Best First'.
Re^3: Data compression by 50% + : is it possible?
by roboticus (Chancellor) on May 12, 2019 at 12:09 UTC

    LanX:

    You're not missing anything that I know of. What I was basing my "not quite" phrasing on is the idea of using a single character to encode each group (@c) into a character, so it would use 9 characters (72) bits. Had I thought of just packing the required 51 bit records together, it would be more than sufficient to get 50% compression, as the file would take 635 bytes to encode 100 records (sans newlines).

    (The 51 bits came from: 50 different possibilities for each group of 10 in the inner loop (log2(50) == 5.64.. bits/group) * (9 groups) == 50.79 bits.)

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      >  using a single character to encode each group (@c) into a character, so it would use 9 characters (72) bits

      Oh I see, but I hope you are aware that your approach can easily be packed into 9*6=54 bits and is easier to code than mine.

      With a per line ratio of 7 bytes = 56 bits you'll already have a 50%+ compression.

      My approach would require modulo 50 calculations on 51 bit integers, not sure how tricky this is on a 32 bit machine.

      So I'd rather "waste" 5 bits/line for a pragmatic solution.

      Btw: I'm reluctant trying to implement Huffman here, because the OP could probably change the parameters in his next post.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1233638]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2019-06-20 20:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Is there a future for codeless software?



    Results (91 votes). Check out past polls.

    Notices?