Re^2: Data compression by 50% + : is it possible?

by LanX (Archbishop)
on May 14, 2019 at 17:37 UTC

in reply to Re: Data compression by 50% + : is it possible?
in thread Data compression by 50% + : is it possible?

Many people in this thread miss crucial information already given in the OP's text!

(apart from reading the explicit example code given)

> - order needs not to be preserved

> - occur only once in a given line.

> - They cannot be consecutive (meaning there is no sequence in a dataset).

I.e. tuples like (3,1, ...), (1,1,...) or (1,2, ...) are impossible. (see OPs if condition)

But the OP's format is obviously highly redundant, he's not only

  • allowing such tuples
  • but also unsorted input
  • and wasting a full byte to encode an 0..9 increment in 9 intervals

Alone the last point leaves sufficient room for compression far beyond near 50%.

Roboticus and I already elaborated this explicitly by demonstrating all possible independent tuples and pointing to their near optimal compression using Huffman coding.

Cheers Rolf
(addicted to the Perl Programming Language :)
