Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Specialized data compression

by danaj (Scribe)
on Sep 17, 2012 at 17:45 UTC ( #994065=note: print w/ replies, xml ) Need Help??


in reply to Specialized data compression

The Data::BitStream and Data::BitStream::XS modules may be of some help (shameless plug since I'm the author). They're made for supporting the sort of compression you're looking at.

Using Adaptive Rice coding yields about 6.8:1, Exponential Golomb with best parameters about 6.9:1. xz/bzip2/gzip aren't able to help much with either result, yielding only about 7:1. So sadly not "significant" vs. 6.5:1.

One advantage of this is that, like your packing, it writes the values compressed, so no second stage of running xz needed.

There are lots of ways you could tweak the variable length output. Adaptive Rice works pretty well without a lot of thought about the parameters (the initial values used don't matter much as they'll adjust quickly). You could get complicated with Comma / Taboo, StartStop / StartStepStop, etc. codes if you wanted. It's also easy to do lossy coding by shifting the deltas before encoding and back again after decoding (taking care to keep symmetry in compressor/decompressor).

Quick example using your CSV:

#!/usr/bin/perl use warnings; use strict; use Data::BitStream::XS; use Text::CSV; use autodie; my $use_arice = 1; my $csv = Text::CSV->new({ sep_char => ',' }); { my $file = 'accel.csv'; open(my $data, '<', $file); my @prev = (0, 0, 0, 0); my @rice = (4, 4, 4, 4); my @egol = (8, 6, 6, 5); my $stream = Data::BitStream::XS->new( file => 'accel.gamma', mode = +> 'w' ); while (my $line = <$data>) { next if $line =~ /^;/; $line =~ s/[\r\n]//g; die "Bad CSV: $line" unless $csv->parse($line); my @fields = $csv->fields(); $fields[0] = int($fields[0] * 10000 + 0.5); # Drop last two digit +s foreach my $n (0..3) { my $p = $prev[$n]; my $v = $fields[$n]; my $delta = $v - $p; my $udelta = ($n == 0) ? $delta : ($delta >= 0) ? 2*$delta : -2*$delt +a-1; $use_arice ? $stream->put_arice($rice[$n], $udelta) : $stream->put_expgolomb($egol[$n], $udelta); } @prev = @fields; } close($data); $stream->write_close; } # Verify we can read the stream back and write a CSV file { my @prev = (0, 0, 0, 0); my @rice = (4, 4, 4, 4); my @egol = (8, 6, 6, 5); my $stream = Data::BitStream::XS->new( file => 'accel.gamma', mode = +> 'ro' ); my $file = 'new.csv'; open(my $data, '>', $file); while (my $v = $use_arice ? $stream->get_arice($rice[0]) : $stream->get_expgolomb($egol[0]) ) { my $time = $prev[0] + $v; $prev[0] = $time; printf $data "%.4f", ($time/10000); foreach my $n (1..3) { my $udelta = $use_arice ? $stream->get_arice($rice[$n]) : $stream->get_expgolomb($egol[$n]); my $delta = (($udelta & 1) == 0) ? $udelta >> 1 : -(($udelta+1) +>> 1); my $v = $prev[$n] + $delta; print $data ',', $v; $prev[$n] = $v; } print $data "\n"; } close($data); }


Comment on Re: Specialized data compression
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://994065]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2014-09-21 03:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (166 votes), past polls