Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Data compression by 50% + : is it possible?

by LanX (Archbishop)
on May 13, 2019 at 00:27 UTC ( #1233676=note: print w/replies, xml ) Need Help??


in reply to Data compression by 50% + : is it possible?

> Just to settle "is the code right" issue. The code is right.

The question is rather if the distribution of values is like the one simulated by your random numbers.

The other one if you need the "readable" character range or if a binary file is OK.

Please look at the probability output from the script I posted here and tell us if it's accurate, or even better calculate the frequencies of groups from your real data.

I expect a lossless 50% reduction to be easy, because of the unused gaps in your data.

A better compression will need Huffman coding, but for this to work you need the frequency table anyway.

FWIW there are two Huffman modules on CPAN and one script here in the archives.

update

And please post proper replies, I only found your update by accident.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

  • Comment on Re: Data compression by 50% + : is it possible?

Replies are listed 'Best First'.
Re^2: Data compression by 50% + : is it possible?
by baxy77bax (Deacon) on May 13, 2019 at 15:12 UTC
    sorry for the off-the-thread updates :)

    the distribution of values is exactly as the one simulated. so every time the automata spits the number, it is either larger than the previous, never consecutive (larger by +1) and never larger than 90.

    the readable output is, what it spits out but binary is ok. i just need to store it.

    Frequency table looks ok but i'll check it one more time and post if i find irregularities. However, i understand what you and roboticus did. Thank you for the input !! :)

    PS: yes the order i was talking about was on rows

      Great! :)

      FWIW Compress::Huffman looks promising.

      It seems to take a frequency table as input (actually probability, so divide my table by 10000) and to store it together with a bit string and decode it again.

      On a side note:

      You might want to experiment with changed weights, that is multiply the frequency with the length of the key, like 246 with 3 and recalculate the proportions.

      This could give you a better compression compared to the old format because there you need 3 characters for this entry.

      (I'm not sure here, it's mind boggling)

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1233676]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2019-06-16 04:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Is there a future for codeless software?



    Results (76 votes). Check out past polls.

    Notices?