Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
Just another Perl shrine
 
PerlMonks  

Extendable pairwise indexes - prior work?

by pjf (Curate)
on Apr 02, 2007 at 04:37 UTC ( #607755=perlquestion: print w/ replies, xml ) Need Help??
pjf has asked for the wisdom of the Perl Monks concerning the following question:

Update: My original post read X < Y; this was incorrect, and has been corrected to X > Y.

G'day learned monks,

In some of my recent work I needed to develop a fast and compact way of encoding information between pairs of entries in an list, where the tail of that list may grow (but never shrink) over time, and the ordering of the list otherwise remains fixed.

The resulting encoding states that information about the pair (X,Y) is stored in location X(X-1)/2 + Y where X > Y, and indexes start from zero. In my case I'm using this to index bitstrings, but it's equally valid for use in arrays.

My trouble is that I know that this is not a new encoding scheme, since the most modest amount of thought on the subject reveals it as an answer. Unfortunately, I'm having a hard time finding its true name, and hence whether there is an existing module on the CPAN, or if this is a candidate for a CPAN release.

Please note that this is a rather academic question about finding correct attribution and prior work, not one about implementation.

Many thanks,

Comment on Extendable pairwise indexes - prior work?
Re: Extendable pairwise indexes - prior work?
by BrowserUk (Pope) on Apr 02, 2007 at 05:49 UTC

    I'm probably miinterpreting you , but you mean like this?

    [0] Perl> for my $x ( 1 .. 4 ) { for my $y ( 1 .. 5 ) { print "Info for pair( $x, $y ) is stored at: ", $x*($x-1)/2 + +$y; } };; Info for pair( 1, 1 ) is stored at: 1 Info for pair( 1, 2 ) is stored at: 2 Info for pair( 1, 3 ) is stored at: 3 Info for pair( 1, 4 ) is stored at: 4 Info for pair( 1, 5 ) is stored at: 5 Info for pair( 2, 1 ) is stored at: 2 Info for pair( 2, 2 ) is stored at: 3 Info for pair( 2, 3 ) is stored at: 4 Info for pair( 2, 4 ) is stored at: 5 Info for pair( 2, 5 ) is stored at: 6 Info for pair( 3, 1 ) is stored at: 4 Info for pair( 3, 2 ) is stored at: 5 Info for pair( 3, 3 ) is stored at: 6 Info for pair( 3, 4 ) is stored at: 7 Info for pair( 3, 5 ) is stored at: 8 Info for pair( 4, 1 ) is stored at: 7 Info for pair( 4, 2 ) is stored at: 8 Info for pair( 4, 3 ) is stored at: 9 Info for pair( 4, 4 ) is stored at: 10 Info for pair( 4, 5 ) is stored at: 11

    Cos that doesn't seem to make much sense to me?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      My apologies, I probably should have included an example. The important thing to note is that since we're storing information on unordered pairs, so that (0,1) is the same as (1,0). Hence the requirement that X > Y to find the index allows us to cover all possible pairs:

      for my $x (0..5) { for my $y (0..$x-1) { print "Info for pair ($x, $y) is stored at: ",$x*($x-1 +)/2+$y,"\n"; } }

      Which provides the output:

      Info for pair (1, 0) is stored at: 0
      Info for pair (2, 0) is stored at: 1
      Info for pair (2, 1) is stored at: 2
      Info for pair (3, 0) is stored at: 3
      Info for pair (3, 1) is stored at: 4
      Info for pair (3, 2) is stored at: 5
      Info for pair (4, 0) is stored at: 6
      Info for pair (4, 1) is stored at: 7
      Info for pair (4, 2) is stored at: 8
      Info for pair (4, 3) is stored at: 9
      Info for pair (5, 0) is stored at: 10
      Info for pair (5, 1) is stored at: 11
      Info for pair (5, 2) is stored at: 12
      Info for pair (5, 3) is stored at: 13
      Info for pair (5, 4) is stored at: 14
      

      The important thing to note here is that as the range expands we're able to encode the additional pairs created by the addition of extra elements into our potential array.

      As a simple example, I'm using this to store information about undirected graphs, where every vertex may contain a link to any other vertex, and the graph can grow over time. By using the algorithm described above we can encode the entire graph as a bitstring, with each position indicating the presence (true) or absence (false) of an edge between two nodes.

      The whole thing works just fine, but I'm certain it's not an original algorithm, hence my seeking of wisdom before anything goes up on the CPAN.

        Yes. That makes much more sense now. I did see the X < Y criteria in the OP (which incidentally you've reversed in the above post?), and thought that was what I was missing, but I didn't immmediately see how it was applied.

        I don't have any reference to prior art to offer, though I will have a look around to see what I can turn up--but presumably you've already done this. It can be quite hard to know what to search for in the situations.

        I like the idea of a graphing module based around this and bitstrings, it sounds like it could be quite efficient in terms of both time and memory, unlike the existing implementations I've tried.

        Good luck with it.

        Update: Maybe this is related to the problem?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Extendable pairwise indexes - prior work?
by dk (Chaplain) on Apr 02, 2007 at 07:53 UTC
    Not trying to be negative, sorry if I'm discouraging you, but don't you think that sub d2_to_d1($$) { int( $_[0] * ( $_[0] - 1) / 2 + $_[1]) } needs a bit more meat for being a separate module? If you'll couple it with efficient graph storage, that could be interesting, but on itself... I don't know.
Re: Extendable pairwise indexes - prior work?
by Anno (Deacon) on Apr 02, 2007 at 08:01 UTC
    The scheme you describe has been used for space-efficient storage of symmetric and triangular matrices. I'd look in that direction.

    For example, see Knuth The Art of Computer Programming, Vol. I, 2.2.6 Arrays and Orthogonal Lists

    Anno

    Update: Knuth reference added

      This is exactly the sort of reference I was after! Thank-you very, very much Anno!

Re: Extendable pairwise indexes - prior work?
by Moron (Curate) on Apr 03, 2007 at 14:33 UTC
    The algorithm shown in the OP falls under the more generic category of "hashing algorithm" which is unlikely to need reinventing for Perl, whereby I would advocate instead something like:
    # more Perlish implementation example (update: simplified) package UnorderedPair; use strict; sub new { my $class = shift; my $self = bless {}; @_ and $self -> encode( @_ ); return $self, $class; } sub encode { # x, y, value my $self = shift; my ( $x, $y ) = NDsc( $_[0], $_[1] ); $self -> { $x }{ $y } = $_[2]; } sub decode { # x, y ... return value my $self = shift; my ( $x, $y ) = NDsc( $_[0], $_[1] ); return $self -> { $x }{ $y }; } sub NDsc{ # sort numeric descending return sort { $b <=> $a } @_; } 1;

    -M

    Free your mind

Re: Extendable pairwise indexes - prior work?
by blokhead (Monsignor) on Apr 04, 2007 at 01:46 UTC
    I would call this a special case of a ranking algorithm for combinations. An "unordered pair" as you have is simply a "subset of size 2" (or "combination of size 2"). A ranking algorithm is a 1-to-1 mapping from a set of N combinatorial objects to the integers {1...N}, with the additional property that it preserves some sort of "ordering" among the combinatorial objects.

    In this case, your ranking algorithm preserves a form of lexicographic ordering. When you write the pairs as (M,N) with M>N, then you have that rank(M,N)<rank(R,S) if and only if (M,N) appears before (R,S) in lexicographic order. You may not actually use this property, but you got it for free. ;)

    Look at the ranking algorithm I gave in Re: Sampling from Combination Space (for general combinations):

    sub rank { my @comb = sort { $a <=> $b } @_; sum map { binomial( $comb[$_]-1, $_+1 ) } 0 .. $#comb; }
    For combinations of size 2, ours both compute the same thing:
    binomial(X,2)+binomial(Y,1) = X(X-1)/2 + Y
    I got the algorithm from a PDF draft version of a textbook that may never get published (which would be a shame). That PDF is not available online anymore, but if you are curious, I can send you the copy I have archived (hopefully that's kosher). I would be surprised if these kinds of things don't also show up somewhere in Knuth TAoCP volume 4 (if it's out yet, that is), where it deals with the related topics of combinatorial generation.

    blokhead

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://607755]
Approved by randyk
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2014-04-21 02:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (489 votes), past polls