http://www.perlmonks.org?node_id=1233635


in reply to Data compression by 50% + : is it possible?

The 50% compression rate is possible. More is possible. Depends on the arrangement of input data and the algorithm you're using!

Here's an idea: Take the first bit of each number and create a list of numbers from that. See if you can compress that list at a better rate. Take the second bit of each input number and create another list, and so on... If your numbers are all even-odd-even-odd or all odd or all even numbers, then this method will help.

If the input numbers are totally random, I would still not give up just yet! I'd generate a list of "random" numbers and XOR the input numbers with randoms to get a new list that has a better chance of being compressed. Try that!

Most programs generate random numbers this way:

FOR LOOP:
   S = (S * A + B) % C
   print "Random number: ", S
END FOR

S is the initial seed for the random number generator. Programs usually set this to the number of milliseconds since 1970. A, B, and C are constants that can be any random value. In many programming languages the builtin random() function usually returns a number between 0 and 1. And in order to get that, C must be 1. If you repeat this calculation over and over again, you get a list of numbers that seems quite random.

By modifying the values of S, A, B, or C even slightly, you get a totally different series of numbers! If, let's say, A is 13.4849927107, and you just change one digit, you will get a totally different list of numbers that does not resemble the previous set at all. So, you could initialize these constants and then get a random list. Take two random lists and either ADD the values or XOR them or whatever. The resulting list MIGHT HAVE more order than your input data set! And this can help you compress the list further.

I've done this with ZIP files... You know, when you compress a ZIP file and you compress it again and again, you reach a limit after which the size starts growing instead of shrinking! But if, at some point, you encode the ZIP file using a list of random numbers, you can sometimes ZIP it again further and get an even smaller file! ;-)