http://www.perlmonks.org?node_id=861961

daverave has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I'm using the following package, which was adapted from http://stackoverflow.com/questions/3790166/, to store a large list of ranges and allow time-efficient "how-many-ranges-cover-this-given-range" queries.

use strict; use warnings; package RangeMap; # Credit: Aristotle Pagaltzis! sub new($$$) { my $class = shift; my $max_length = shift; my $ranges_a = shift; my @lookup; for (@{$ranges_a}) { my ( $start, $end ) = @$_; my @idx = $end >= $start ? $start .. $end : ( $start .. $max_length, 1 .. $end ); for my $i (@idx) { $lookup[$i] .= pack 'L', $end } } bless \@lookup, $class; } sub num_ranges_containing($$$) { my $self = shift; my ( $start, $end ) = @_; return 0 unless (defined $self->[$start]); return 0 + grep { $end <= $_ } unpack 'L*', $self->[$start]; } 1;
After creating an object I store it using nstore for future use. This usually results in a binary file of size 500MB-2GB.

So, I wonder if I could somehow store it more compactly to save on disk space, without slowing load time too much.

I know it's a give and take - and I'm willing to pay in a longer load (retrieve) time, since my common usage would be a retrieval followed by a few million queries.

Some more details which might be relevant: the number of ranges is usually around 30k, max_length varies significantly between 20k to 15M, where most commonly it's around 3-4M.

Thanks,
Dave