Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

hashing intervals

by baxy77bax (Deacon)
on Jul 29, 2013 at 15:29 UTC ( [id://1046877]=perlquestion: print w/replies, xml ) Need Help??

baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:

Hi, well i know it is probably not the best title but... So what I have been trying to build is a data structure, a table from which i could retrieve the information very fast given a certain key is provided. A first thing that popped into my mind when I saw the problem was a hash table. But my data is very peculiar. Example:
23-45 12 11-17 2 45-134 23 10-10 65 1-1 45 19-20 10 ...
so it is a table of intervals associated with some numbers. Intervals cannot overlap (Thank U G) but some intervals can remain empty like 18-18 or 21-22. the interval range is form 1 to 1000000000. associated values are from 1 to 5000000. the thing is when i am retrieving associated numbers I have to be able to make a query of the type:

"Q: What is the value associated with 50?"
"A: It is 23."

Is it clear what I am trying to get? Now, I could create a hash table for all possible values from 45 to 134 and associate each key to 23 and get the job done, but this seam like too memory wasteful. And since this looks like a trivial task I guess somebody already solved it efficiently and I can only be grateful if she/he is willing to share the knowledge :)

thank you

baxy

Replies are listed 'Best First'.
Re: hashing intervals
by choroba (Cardinal) on Jul 29, 2013 at 16:09 UTC
    Here is a simple OO solution with fixed non-overlapping data. It keeps an array of triplets beginning, end, value. It sorts this array at the end to make the retrieval possible. You might need to implement a faster searching algorithm (e.g. binary search) if your data are large and time is critical.
    #!/usr/bin/perl use warnings; use strict; use feature qw(say); { package Triplets; sub new { return bless [], shift; } sub add { my $self = shift; push @$self, \@_; } sub sort { my $self = shift; @$self = sort { $a->[0] <=> $b->[0] } @$self; } sub retrieve { my $self = shift; my $value = shift; my $idx = 0; $idx++ while $idx <= $#$self and $self->[$idx][0] <= $value; return $self->[$idx-1][-1] if $idx and $self->[$idx-1][1] >= $ +value; return; } } my $triplets = 'Triplets'->new; while (<DATA>) { my ($int, $value) = split; my ($from, $to) = split /-/, $int, 3; $triplets->add($from, $to, $value); } $triplets->sort; say "$_: ", $triplets->retrieve($_) // 'Undefined' for 1 .. 135; __DATA__ 23-45 12 11-17 2 46-134 23 fixed here!!! 10-10 65 1-1 45 19-20 10
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: hashing intervals
by daxim (Curate) on Jul 29, 2013 at 15:35 UTC
Re: hashing intervals
by LanX (Saint) on Jul 29, 2013 at 16:41 UTC
    It depends where your priority are:
    • speed ?
    • memory ?
    • flexibility ?
    A trivial approach is a binary search in a sorted array where you store the upper bound for each interval (including the undefined ones).

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: hashing intervals
by choroba (Cardinal) on Jul 29, 2013 at 15:51 UTC
    Intervals cannot overlap

    Great!
    Q: What's the value associated with 45?
    A: It is 12. No, wait, 23. Uh...

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      12 is incorrect, as is 23.

      See: but some intervals can remain empty like 18-18 or 21-22

      So it seems the boundaries are excluded.


      s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
      +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
        > So it seems the boundaries are excluded.

        ... 10-10 65 1-1 45 ...

        Cheers Rolf

        ( addicted to the Perl Programming Language)

Re: hashing intervals
by hdb (Monsignor) on Jul 29, 2013 at 15:47 UTC
Re: hashing intervals
by Laurent_R (Canon) on Jul 29, 2013 at 17:44 UTC

    Since the order matters in such a case, a hash might not be the best data structure, I would also go for a sorted array and a binary search.

Re: hashing intervals
by sundialsvc4 (Abbot) on Jul 30, 2013 at 03:11 UTC

    Your data could be expressed in an SQL (or SQLite) table with columns (low, high, value).   If the problem were to be approached in this way, the values associated with an entire table-full of values-of-interest could be obtained using a simple INNER JOIN query, e.g.:

    SELECT a.search_for, b.code
    FROM table_of_values a
    INNER JOIN table_of_ranges b
    ON (a.search_for IS BETWEEN b.low AND b.high)
    SORT BY a.search_for

    Obviously, “it all depends on what you are doing and on the context in which you are doing it,” but this eliminates the need to “write a Perl program” altogether.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1046877]
Approved by hdb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-18 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found