daverave has asked for the wisdom of the Perl Monks concerning the following question:
I'm using the following package, which was adapted from http://stackoverflow.com/questions/3790166/, to store a large list of ranges and allow time-efficient "how-many-ranges-cover-this-given-range" queries.
After creating an object I store it using nstore for future use. This usually results in a binary file of size 500MB-2GB.use strict; use warnings; package RangeMap; # Credit: Aristotle Pagaltzis! sub new($$$) { my $class = shift; my $max_length = shift; my $ranges_a = shift; my @lookup; for (@{$ranges_a}) { my ( $start, $end ) = @$_; my @idx = $end >= $start ? $start .. $end : ( $start .. $max_length, 1 .. $end ); for my $i (@idx) { $lookup[$i] .= pack 'L', $end } } bless \@lookup, $class; } sub num_ranges_containing($$$) { my $self = shift; my ( $start, $end ) = @_; return 0 unless (defined $self->[$start]); return 0 + grep { $end <= $_ } unpack 'L*', $self->[$start]; } 1;
So, I wonder if I could somehow store it more compactly to save on disk space, without slowing load time too much.
I know it's a give and take - and I'm willing to pay in a longer load (retrieve) time, since my common usage would be a retrieval followed by a few million queries.
Some more details which might be relevant: the number of ranges is usually around 30k, max_length varies significantly between 20k to 15M, where most commonly it's around 3-4M.
Thanks,
Dave