Re: Data compression by 50% + : is it possible?by roboticus (Chancellor)
|on May 11, 2019 at 23:09 UTC||Need Help??|
That depends on how much redundant information there is in your data. The more redundancy you have, the easier it is to squeeze it.
So the first thing you ought to do is find out how random your data appears to be. I rand your script a few times, and it seems to generate about 15 characters per line.
I did a little hacking on your inner loop and discovered that due to your if statements in the encoding of your array @c, there are only 50 different possibilities for it, which you could encode easily in 1 character (six bits). Since you've got 9 iterations of that loop, each line could be encoded as 9 characters. That's not quite enough to crunch out half the space, but it should get you a good start.
The 50 different possibilities have a fairly non-uniform distribution, where the most common 4 should appear 25% of the time, and the most common 9 appear over 50% of the time, so there's definitely some room to squeeze out even more space.
When I ran it, it shows:
What the table means is that the most common result would be that the array @c would yield chr(33 + 7 + $x), and would do so 8.4% of the time. The fourth row shows that the array @c would print nothing at all 5.92% of the time. The ninth row shows that we would print chr(33 + 6 + $x), chr(33 + 8 + $x) 3.96% of the time.
By using the table and encoding the values in 9 characters, you could also omit the "\n" and read the data as fixed length records, if you don't want to try to find out a method to squeeze out a bit more. I'd suggest reading Huffman_coding and/or Arithmetic_coding for other ideas.
Edit: tweaked a phrase, fixed the links to wikipedia articles.
When your only tool is a hammer, all problems look like your thumb.