pjf has asked for the
wisdom of the Perl Monks concerning the following question:
Update: My original post read X < Y; this was incorrect, and has been corrected to X > Y.
G'day learned monks,
In some of my recent work I needed to develop a fast and compact way of encoding information between pairs of entries in an list, where the tail of that list may grow (but never shrink) over time, and the ordering of the list otherwise remains fixed.
The resulting encoding states that information about the pair (X,Y) is stored in location X(X1)/2 + Y where X > Y, and indexes start from zero. In my case I'm using this to index bitstrings, but it's equally valid for use in arrays.
My trouble is that I know that this is not a new encoding scheme, since the most modest amount of thought on the subject reveals it as an answer. Unfortunately, I'm having a hard time finding its true name, and hence whether there is an existing module on the CPAN, or if this is a candidate for a CPAN release.
Please note that this is a rather academic question about finding correct attribution and prior work, not one about implementation.
Many thanks,
Re: Extendable pairwise indexes  prior work?
by Anno (Deacon) on Apr 02, 2007 at 08:01 UTC

The scheme you describe has been used for spaceefficient storage of symmetric and triangular matrices. I'd look in that direction.
For example, see Knuth The Art of Computer Programming, Vol. I, 2.2.6 Arrays and Orthogonal Lists
Anno
Update: Knuth reference added  [reply] 

 [reply] 
Re: Extendable pairwise indexes  prior work?
by BrowserUk (Pope) on Apr 02, 2007 at 05:49 UTC

I'm probably miinterpreting you , but you mean like this?
[0] Perl> for my $x ( 1 .. 4 ) {
for my $y ( 1 .. 5 ) {
print "Info for pair( $x, $y ) is stored at: ", $x*($x1)/2 +
+$y;
}
};;
Info for pair( 1, 1 ) is stored at: 1
Info for pair( 1, 2 ) is stored at: 2
Info for pair( 1, 3 ) is stored at: 3
Info for pair( 1, 4 ) is stored at: 4
Info for pair( 1, 5 ) is stored at: 5
Info for pair( 2, 1 ) is stored at: 2
Info for pair( 2, 2 ) is stored at: 3
Info for pair( 2, 3 ) is stored at: 4
Info for pair( 2, 4 ) is stored at: 5
Info for pair( 2, 5 ) is stored at: 6
Info for pair( 3, 1 ) is stored at: 4
Info for pair( 3, 2 ) is stored at: 5
Info for pair( 3, 3 ) is stored at: 6
Info for pair( 3, 4 ) is stored at: 7
Info for pair( 3, 5 ) is stored at: 8
Info for pair( 4, 1 ) is stored at: 7
Info for pair( 4, 2 ) is stored at: 8
Info for pair( 4, 3 ) is stored at: 9
Info for pair( 4, 4 ) is stored at: 10
Info for pair( 4, 5 ) is stored at: 11
Cos that doesn't seem to make much sense to me?
Examine what is said, not who speaks  Silence betokens consent  Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
 [reply] [d/l] 

My apologies, I probably should have included an example. The important thing to note is that since we're storing information on unordered pairs, so that (0,1) is the same as (1,0). Hence the requirement that X > Y to find the index allows us to cover all possible pairs:
for my $x (0..5) {
for my $y (0..$x1) {
print "Info for pair ($x, $y) is stored at: ",$x*($x1
+)/2+$y,"\n";
}
}
Which provides the output:
Info for pair (1, 0) is stored at: 0
Info for pair (2, 0) is stored at: 1
Info for pair (2, 1) is stored at: 2
Info for pair (3, 0) is stored at: 3
Info for pair (3, 1) is stored at: 4
Info for pair (3, 2) is stored at: 5
Info for pair (4, 0) is stored at: 6
Info for pair (4, 1) is stored at: 7
Info for pair (4, 2) is stored at: 8
Info for pair (4, 3) is stored at: 9
Info for pair (5, 0) is stored at: 10
Info for pair (5, 1) is stored at: 11
Info for pair (5, 2) is stored at: 12
Info for pair (5, 3) is stored at: 13
Info for pair (5, 4) is stored at: 14
The important thing to note here is that as the range expands we're able to encode the additional pairs created by the addition of extra elements into our potential array.
As a simple example, I'm using this to store information about undirected graphs, where every vertex may contain a link to any other vertex, and the graph can grow over time. By using the algorithm described above we can encode the entire graph as a bitstring, with each position indicating the presence (true) or absence (false) of an edge between two nodes.
The whole thing works just fine, but I'm certain it's not an original algorithm, hence my seeking of wisdom before anything goes up on the CPAN.
 [reply] [d/l] 

Yes. That makes much more sense now. I did see the X < Y criteria in the OP (which incidentally you've reversed in the above post?), and thought that was what I was missing, but I didn't immmediately see how it was applied.
I don't have any reference to prior art to offer, though I will have a look around to see what I can turn upbut presumably you've already done this. It can be quite hard to know what to search for in the situations.
I like the idea of a graphing module based around this and bitstrings, it sounds like it could be quite efficient in terms of both time and memory, unlike the existing implementations I've tried.
Good luck with it.
Update: Maybe this is related to the problem?
Examine what is said, not who speaks  Silence betokens consent  Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
 [reply] 
Re: Extendable pairwise indexes  prior work?
by dk (Chaplain) on Apr 02, 2007 at 07:53 UTC

Not trying to be negative, sorry if I'm discouraging you, but don't you think that sub d2_to_d1($$) { int( $_[0] * ( $_[0]  1) / 2 + $_[1]) } needs a bit more meat for being a separate module? If you'll couple it with efficient graph storage, that could be interesting, but on itself... I don't know.  [reply] [d/l] 
Re: Extendable pairwise indexes  prior work?
by blokhead (Monsignor) on Apr 04, 2007 at 01:46 UTC

I would call this a special case of a ranking algorithm for combinations. An "unordered pair" as you have is simply a "subset of size 2" (or "combination of size 2"). A ranking algorithm is a 1to1 mapping from a set of N combinatorial objects to the integers {1...N}, with the additional property that it preserves some sort of "ordering" among the combinatorial objects.
In this case, your ranking algorithm preserves a form of lexicographic ordering. When you write the pairs as (M,N) with M>N, then you have that rank(M,N)<rank(R,S) if and only if (M,N) appears before (R,S) in lexicographic order. You may not actually use this property, but you got it for free. ;)
Look at the ranking algorithm I gave in Re: Sampling from Combination Space (for general combinations):
sub rank {
my @comb = sort { $a <=> $b } @_;
sum map { binomial( $comb[$_]1, $_+1 ) } 0 .. $#comb;
}
For combinations of size 2, ours both compute the same thing:
binomial(X,2)+binomial(Y,1) = X(X1)/2 + Y
I got the algorithm from a PDF draft version of a textbook that may never get published (which would be a shame). That PDF is not available online anymore, but if you are curious, I can send you the copy I have archived (hopefully that's kosher). I would be surprised if these kinds of things don't also show up somewhere in Knuth TAoCP volume 4 (if it's out yet, that is), where it deals with the related topics of combinatorial generation.
 [reply] [d/l] [select] 
Re: Extendable pairwise indexes  prior work?
by Moron (Curate) on Apr 03, 2007 at 14:33 UTC

The algorithm shown in the OP falls under the more generic category of "hashing algorithm" which is unlikely to need reinventing for Perl, whereby I would advocate instead something like:
# more Perlish implementation example (update: simplified)
package UnorderedPair;
use strict;
sub new {
my $class = shift;
my $self = bless {};
@_ and $self > encode( @_ );
return $self, $class;
}
sub encode { # x, y, value
my $self = shift;
my ( $x, $y ) = NDsc( $_[0], $_[1] );
$self > { $x }{ $y } = $_[2];
}
sub decode { # x, y ... return value
my $self = shift;
my ( $x, $y ) = NDsc( $_[0], $_[1] );
return $self > { $x }{ $y };
}
sub NDsc{ # sort numeric descending
return sort { $b <=> $a } @_;
}
1;
 [reply] [d/l] 

