Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Data compression by 50% + : is it possible?

by betmatt (Acolyte)
on May 14, 2019 at 02:52 UTC ( #1233743=note: print w/replies, xml ) Need Help??


in reply to Data compression by 50% + : is it possible?

Hi,

Without reading through all the replies....To be honest you have had quite a few.

My advice would be that there is always the potential to compress. Even in random sequences you will get repeated patterns. The difficulty is finding those patterns. You want a pattern that is easy to find first.

My advice would be to sort the text of interest and then count for each character type. That might be best done in a database. You might want to split the text up as well and do that bit by bit. With this information you will be able to better find opportunities for compression.


Update__________________


With infinite computing power LanX would be right.

For very large file sizes it would be difficult if not impossible to find the best compression solution. In that situation I would be right.

I know that the question states 50%. But really if you think about it you could compress the stored algorithms that do the transformation. It just goes on and on. Do people understand what I am saying?

Sorting is a good place to start maybe because the algorithm's (code) can be modified from that point in order to preserve information that will allow the recreation of the original file. There are a range of different sorting algorithms that I have in a book here. If anyone want me to post any of those I will.
  • Comment on Re: Data compression by 50% + : is it possible?

Replies are listed 'Best First'.
Re^2: Data compression by 50% + : is it possible?
by LanX (Archbishop) on May 14, 2019 at 03:09 UTC
    > My advice would be that there is always the potential to compress. Even in random sequences you will get repeated patterns. The difficulty is finding those patterns. You want a pattern that is easy to find first.

    That's drivel, you can only compress independent information without loss as long as density allows.

    A bit will never encode 3 states.

    update

    > Do people understand what I am saying?

    no

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1233743]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2019-06-26 16:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Is there a future for codeless software?



    Results (110 votes). Check out past polls.

    Notices?