Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^3: Data compression by 50% + : is it possible?

by LanX (Archbishop)
on May 12, 2019 at 02:26 UTC ( #1233629=note: print w/replies, xml ) Need Help??


in reply to Re^2: Data compression by 50% + : is it possible?
in thread Data compression by 50% + : is it possible?

Instead of calculating I wrote a little script counting the probabilities of group combinations.

Roboticus was right, you need about 14,05 characters per line plus "\n"

Only 1 + 8 + 21 + 20 = 50 combinations are possible per group, resulting in a conservative compression of 50.79 bits = 6.34 bytes per line, which already means a compression to 42% = 58% win.

But if you look into the likelihood of those combinations you clearly see that a Huffman encoding would result in an even better ratio.

I think that's then near the theoretical optimum. (further reading Huffman_coding_with_unequal_letter_costs )

All this supposing your input was real... ;-)

use strict; use warnings; use Data::Dump qw/pp/; my %count; for my $c0 (0..9){ for my $c1 (0..9){ for my $c2 (0..9){ for my $c3 (0..9) { my @c = sort {$a <=>$b} ($c0,$c1,$c2,$c3); #print "@c\t:\t"; my @allowed; for my $i (1..3) { if ( $c[$i] != $c[$i-1] && $c[$i] != $c[$i-1]+1 ) +{ push @allowed, $c[$i] } } #print "@allowed\n"; $count{join "",@allowed}++ } } } } my @length; my $average; for my $k (keys %count){ my $len = length $k; $length[$len]++; $average+= $len* $count{$k}/10000; } warn '@length: ', pp \@length; warn 'average #characters: group/line',pp [$average,$average*9]; my $combies =0; $combies+= $length[$_] for 0..3; #$combies=93; warn "# possible combinations: ", $combies; my $upper_bound= log($combies)/log(2)*9; warn 'Upper bound bits, bytes', pp [$upper_bound, $upper_bound/8]; #warn "ranking", pp [ sort {$b <=>$a} values %count ]; warn 'probabilities: ',pp \%count;

@length: [1, 8, 21, 20] at /tmp/compress.pl line 36. average #characters: group/line[1.5624, 14.0616] at /tmp/compress.pl l +ine 37. # possible combinations: 50 at /tmp/compress.pl line 44. Upper bound bits, bytes[50.7947057079725, 6.34933821349656] at /tmp/co +mpress.pl line 48. probabilities: { "" => 592, "2" => 74, "24" => 60, "246" => 24, "247" => 24, "248" => 24, "249" => 24, "25" => 84, "257" => 24, "258" => 24, "259" => 24, "26" => 84, "268" => 24, "269" => 24, "27" => 84, "279" => 24, "28" => 84, "29" => 60, "3" => 208, "35" => 144, "357" => 48, "358" => 48, "359" => 48, "36" => 192, "368" => 48, "369" => 48, "37" => 192, "379" => 48, "38" => 192, "39" => 144, "4" => 366, "46" => 228, "468" => 72, "469" => 72, "47" => 300, "479" => 72, "48" => 300, "49" => 228, "5" => 524, "57" => 312, "579" => 96, "58" => 408, "59" => 312, "6" => 682, "68" => 396, "69" => 396, "7" => 840, "79" => 336, "8" => 830, "9" => 508, } at /tmp/compress.pl line 52.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

update

I just realized that Roboticus already had the same basic ideas here: Re: Data compression by 50% + : is it possible?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1233629]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (8)
As of 2019-06-26 16:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Is there a future for codeless software?



    Results (110 votes). Check out past polls.

    Notices?