Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re: Data compression by 50% + : is it possible?

by harangzsolt33 (Pilgrim)
on May 14, 2019 at 17:17 UTC ( #1233776=note: print w/replies, xml ) Need Help??

in reply to Data compression by 50% + : is it possible?

If we calculate the number of possible ways one could arrange 90 numbers, that is a very large number. And if my calculations were correct, it would require 463 bits to store that number! So, using this method, it's not possible to compress by 50%, because 90 bytes is 720 bits. OR if you store each int in 7 bits, then that's 630 bits total. So, you have a list of 630 bits. And mathematically, it is possible to store this list of ints in 463 bits BECAUSE you said that each int from 1 to 90 only occurs once!

But as I said, the compression ratio depends on the algorithm you're using! Using the mathematic approach I explained above, it is possible to store 630 bits in 463 bits ALWAYS regardless of how the numbers are shuffled! That means the output will be EXACTLY 463 bits regardless of whether we started with a list that is perfectly sorted or totally random.

  • Comment on Re: Data compression by 50% + : is it possible?

Replies are listed 'Best First'.
Re^2: Data compression by 50% + : is it possible?
by LanX (Archbishop) on May 14, 2019 at 17:37 UTC
    Many people in this thread miss crucial information already given in the OP's text!

    (apart from reading the explicit example code given)

    > - order needs not to be preserved

    > - occur only once in a given line.

    > - They cannot be consecutive (meaning there is no sequence in a dataset).

    I.e. tuples like (3,1, ...), (1,1,...) or (1,2, ...) are impossible. (see OPs if condition)

    But the OP's format is obviously highly redundant, he's not only

    • allowing such tuples
    • but also unsorted input
    • and wasting a full byte to encode an 0..9 increment in 9 intervals

    Alone the last point leaves sufficient room for compression far beyond near 50%.

    Roboticus and I already elaborated this explicitly by demonstrating all possible independent tuples and pointing to their near optimal compression using Huffman coding.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1233776]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2019-06-20 19:17 GMT
Find Nodes?
    Voting Booth?
    Is there a future for codeless software?

    Results (91 votes). Check out past polls.