http://www.perlmonks.org?node_id=861978


in reply to Serializing a large object

To store the data in a more compact form, it's necessary to understand the data so you can look for more compact representations of the information it represents. And for now, that isn't clear (to me).

I set up a small set of input ranges--of both the normal--$start <= $end type; and the inverted $end < $start type; and then tried to make sense of the numbers returned by num_ranges_containing(). And I cannot.

I input these ranges:

my @ranges = ( [ 0, 5 ], [ 1, 6 ], [ 2, 7 ], [ 3, 8 ], [ 4, 9 ], [ 5, 10 ], [ 5, 0 ], [ 6, 1 ], [ 7, 2 ], [ 8, 3 ], [ 9, 4 ], [ 10, 5 ], );

And then asked for the counts containing the ranges: [0,3], [1,4], ...., [7,10], and as the returns didn't add up, I did a simple plot:

c:\test>861961 ------ 0.. 5 ------ 1.. 6 ------ 2.. 7 ------ 3.. 8 ------ 4.. 9 ------ 5..10 ----- 5.. 0 - ---- 6.. 1 -- --- 7.. 2 --- -- 8.. 3 ---- - 9.. 4 ----- 10.. 5 ---- 0.. 3 range: 0 .. 3 is contained by 1 ranges ---- 1.. 4 range: 1 .. 4 is contained by 4 ranges ---- 2.. 5 range: 2 .. 5 is contained by 4 ranges ---- 3.. 6 range: 3 .. 6 is contained by 3 ranges ---- 4.. 7 range: 4 .. 7 is contained by 3 ranges ---- 5.. 8 range: 5 .. 8 is contained by 3 ranges ---- 6.. 9 range: 6 .. 9 is contained by 2 ranges ---- 7..10 range: 7 .. 10 is contained by 1 range +s

Looking at just a couple:

How am I misinterpreting the data?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Serializing a large object
by daverave (Scribe) on Sep 25, 2010 at 19:11 UTC
    The ranges are given in biological coordinates, meaning the first coordinate is 1 (0 is illegal) and max_length is a legal coordinate. So, if max_length=10 then our coordinates are in 1..10 (both inclusive). Also note that a range like [2,4] expands to2,3,4 since both start and end are inclusive.

    This convention always causes some trouble, and most of the time I use to convert the coordinates at the beginning and at the end so I can work with 0-based coordinates. In this case I didn't since it's quite simple, so I'm working with biological coordinates.

    Anyway, if we now take your example and arbitrarily replace all 0's with 1's we get:

    my @ranges = ([ 1, 5 ], [ 1, 6 ], [ 2, 7 ], [ 3, 8 ], [ 4, 9 ], [ 5, +10 ],[ 5, 1 ], [ 6, 1 ], [ 7, 2 ], [ 8, 3 ], [ 9, 4 ], [ 10, 5 ],); my $rm = RangeMap->new( 10, \@ranges );

    Now, [1,3] returns 5; since only the first two and last three ranges contain it.

    [1,4] returns 4; since only the first two and last two ranges contain it.

    I hope it makes sense now

      So, an inverted range like [9, 4] includes: 1,2,3,4 & 9,10?

        Exactly.