BioGeek has asked for the
wisdom of the Perl Monks concerning the following question:
I'm trying to implement the following recursive formula in Perl:
''''''''''''''''''''''''''''''''''''_{n}
P(r_{1},r_{2},...,r_{n}) = Σ(r_{ni+1}  r_{ni})P(r_{1},r_{2},...,r_{ni},r_{ni+2},...,r_{n})
'''''''''''''''''''''''''''''''''''^{i=1}
(if this doesn't render well, it is the second formula on this page),
where r _{0}=0 and the recursive call to P supplies all of the original arguments except the (n‑i+1) ^{th} argument.
I've looked at Math::Sequence and Math::Series, but the examples given there are only for simple cases like x _{(n+1)} = 1 / (x _{n} + 1) and I'm a bit lost on how to implement this formula.
As sample data you can use:
r_{1} 
0,11 
0,43 
0,93 
0,91 
0,52 
r_{2} 
0,07 
0,31 
0,78 
0,12 
0,18 
r_{3} 
0,19 
0,37 
0,82 
0,15 
0,32 
(which should give 5 P(r_{1},r_{2},...,r_{n})values)
Update: I've tried to work out a litlle example of the algorithm myself (but correct me If I'm doing something wrong).
P(r_{1},r_{2},r_{3})=
(r_{3}  r_{2})P(r_{1},r_{2}) + (r_{2}  r_{1})P(r_{1},r_{3}) + (r_{1}  r_{0})P(r_{2},r_{3})
Now the iteration comes into play, because to calculate the Pvalues in the rigth hand side of the formula, we use our original formula again.
P(r_{1},r_{2}) = (r_{2}  r_{1})P(r_{1}) + (r_{1}  r_{0})P(r_{2})
P(r_{1},r_{3}) = (r_{3}  r_{1})P(r_{1}) + (r_{1}  r_{0})P(r_{3})
P(r_{2},r_{3}) = (r_{3}  r_{2})P(r_{2}) + (r_{2}  r_{0})P(r_{3})
Update: the rvalues are scores, and P is a probability:
so P(r _{1}) is the chance of having score r _{1},
P(r _{1},r _{2}) is the chance of both having score r _{1} and score r _{2}, and so on.
Re: recursive formula. by rsteinke (Scribe) on Aug 05, 2004 at 14:35 UTC 
sub p
{
my @args = @_; # for clarity, more efficient not to copy
# initializing this handles the zero argument case
my $result = 0;
$result += ($args[$_]  ($_ ? $args[$_1] : 0))
* p(@args[0..($_1)],@args[($_+1)..(@args1)])
foreach(0..(@args1));
return $result;
}
note: untested
Ron Steinke
rsteinke@wlink.net
 [reply] [d/l] 
Re: recursive formula. by trammell (Priest) on Aug 05, 2004 at 14:41 UTC 
What is P()? One? Or is there some special recipe telling what to do when P() has one argument?
A complete example calculation would help immensely.
 [reply] [d/l] [select] 

the rvalues are scores, and P is a probability:
so P(r_{1}) is the chance of having score r_{1},
P(r_{1},r_{2}) is the chance of both having score r_{1} and score r_{2}, and so on.
 [reply] 

Ah, so P() is 1 (the probablity of having anything happen).
Ok, you'd have to ammend my code with
return 1 if !@_:
at the top or something
Ron Steinke
rsteinke@wlink.net
 [reply] [d/l] 

P(r1) = (r1  r0) P()
= r1
Correct?  [reply] [d/l] 

Re: recursive formula. by herveus (Parson) on Aug 05, 2004 at 14:53 UTC 
#!/usr/bin/perl
use strict;
use warnings;
sub P
{
my @r = @_;
my $n = $#r;
return $r[1] if $n == 1; # inferred end to recursion
my $sum = 0;
for my $i (1..$n)
{
my @r_prime = @r;
splice(@r_prime, $n$i+1, 1);
$sum += ($r[$n$i+1]  $r[$n$i]) * P(@r_prime);
}
return $sum;
}
while (<DATA>)
{
chomp;
my @r = split(/,/, $_);
unshift @r, 0; # sets rnaught
print "P($_) = ", P(@r), "\n";
}
__DATA__
0.11, 0.07, 0.19
0.43, 0.31, 0.37
0.93, 0.78, 0.82
0.91, 0.12, 0.15
0.52, 0.18, 0.32
gave the following output:
P(0.11, 0.07, 0.19) = 0.001595
P(0.43, 0.31, 0.37) = 0.046225
P(0.93, 0.78, 0.82) = 0.548235
P(0.91, 0.12, 0.15) = 0.439894
P(0.52, 0.18, 0.32) = 0.010192
The expansion looks similar to computing a determinant, but not
quite...this is a straightup translation of the formula into Perl.
No trickery; no premature optimization.
Update:
I elected to prepend a zero to the array of r values to simplify
life. That also means that the indices in the formula now
correspond to the array indices in the Perl code, making the Perl
read more nearly like the formula. I elected to construct the
array for the recursion explicitly, by directly striking the
omitted r value.
This is meant to make it easier to audit the code to verify that
it says what it is meant to say...
 [reply] [d/l] [select] 

Your results look sound to me.
The above formula is used in extreme value statistics, where outliers are important. You wouldn't catch this when you just take the mean of the values:
M() = mean
M(0.11, 0.07, 0.19) = 0.12
M(0.43, 0.31, 0.37) = 0.37
M(0.93, 0.78, 0.82) = 0.84
M(0.91, 0.12, 0.15) = 0.39
M(0.52, 0.18, 0.32) = 0.34
Now, with our formula:
all low scores, also give a low Pvalue:
P(0.11, 0.07, 0.19) = 0.001595 and P(0.52, 0.18, 0.32) = 0.010192
all high scores give a high Pvalue: P(0.93, 0.78, 0.82) = 0.548235
both as expected. But now: low scores with a high outlier are also important for us, and indeed it also gives a high Pvalue: P(0.91, 0.12, 0.15) = 0.439894
Only for P(0.43, 0.31, 0.37) = 0.046225 I would intitutivly have tougth of a lower value, but it's probably correct.
Thanks for the effort you put into it.  [reply] [d/l] [select] 
Re: recursive formula. by BrowserUk (Pope) on Aug 05, 2004 at 15:41 UTC 
Update: THIS IS A WRONG IMPLEMENTATION. PLease don't++ it!! Thanks, buk.
I'm getting different results from other people, so this is probably wrong, but then maybe not, so...
#! perl slw
use strict;
use List::Util qw[ reduce ];
$a = $a; ## Disable the dumbest warning in perl!
my @samples = (
## r1 r2 r3
[ qw[ 0.11 0.07 0.19 ] ],
[ qw[ 0.43 0.31 0.37 ] ],
[ qw[ 0.93 0.78 0.82 ] ],
[ qw[ 0.91 0.12 0.15 ] ],
[ qw[ 0.52 0.18 0.32 ] ],
);
sub P{
return 1 if @_ == 1;
my @r = @_;
return reduce {
$a + ( $r[ $b ]  $r[ $b  1 ] ) * P( @r[ 0 .. ( $#r  $b ) ]
+ )
} 0 .. $#r;
}
my @results = map P( @$_ ), @samples;
print "@results";
__END__
P:\test>380259
0.1216 0.0744 0.0624999999999999 0.6541 0.2556
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." David Dunham
"Think for yourself!"  Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side."  tachyon
 [reply] [d/l] 
Re: recursive formula. by periapt (Hermit) on Aug 05, 2004 at 15:56 UTC 
This seems to work. Loading the data array for the next recursion seemed, to me, to be the tricky part. I did this by hand and the numbers match. Of course, even with higher math, I still can't add very well ;o)
use strict;
use warnings;
use diagnostics;
#main()
{
my @data = ([0.11,0.07,0.19],
[0.43,0.31,0.37],
[0.95,0.78,0.82],
[0.91,0.12,0.15],
[0.52,0.18,0.32]);
foreach (@data){
my $pval = P($_);
print "value = $pval\n";
}
exit;
} #end main()
sub P{
my $data = shift;
my $n = @{$data};
my $rslt1 = 0;
my $rslt2 = 0;
my $rslt = 0;
return $$data[0] if($n == 1); #r(0) = 0 ==> r(1)r(0) = r(1)
#assume P(r(0)) = 1
unshift @{$data}, 0;
for(my $i=1;$i <= $n; $i++){
my @nextdata = ();
for (1..$n){
next if($_ == ($n$i+1));
push @nextdata, $$data[$_];
}
# split up rslt1 & 2 for clarity
$rslt1 = ($$data[$n$i+1]  $$data[$n$i]);
$rslt2 = P(\@nextdata);
$rslt += $rslt1 * $rslt2;
}
return $rslt;
}
__DATA__
value = 0.001595
value = 0.046225
value = 0.549005
value = 0.439894
value = 0.010192
PJ
unspoken but ever present  use strict; use warnings; use diagnostics; (if needed)
 [reply] [d/l] 
Re: recursive formula. by QM (Vicar) on Aug 05, 2004 at 17:30 UTC 
It seems you have some solutions, so I'm taking a different tack. Two points:
1) You'd probably benefit more from showing us a little code with your question, before seeing the solutions.
2) You may want to look at Memoize, like so:
use Memoize;
# make sure memoize happens before p is called
BEGIN { memoize('p'); }
Or, cache it yourself:
{ # closure for sub p
my %seen;
sub p
{
my $arg_string = join '^', @_;
return $seen{$arg_string} if exists $seen{$arg_string};
# compute $result here as usual
# ...
return $seen{$arg_string} = $result;
} # end sub p
} # end closure for sub p
I leave it to you to work out how to handle the empty parameter list :)
Update: See BrowserUk's Re^2: recursive formula. below for reasons why this isn't a good idea.
QM

Quantum Mechanics: The dreams stuff is made of
 [reply] [d/l] [select] 

I liked the following line from the Memoize perldoc:
"This module exports exactly one function, memoize. The rest of the functions in this package are None of Your Business."
I will look into the empty parameter list stuff.
 [reply] [d/l] 

Memoize won't help for this as for any given set of data, the function will never be called twice with the same set of values. (Apart from the trivial case where the function is called with a single value. And in that case, the single value is return as a special case to end the recursion.)
For large datasets, Memoize would actually slow this function down by building a cache of values that would only ever be hit by chance. That chance, is if there exists two (or more) identical values in dataset.
Given the parameters are lists of floating point values with a continuous domain, "identical values" is an ethereal thing.
Also, the cachekey will be a function of both the number of values in the list, and their ordering. The size of the cache can quickly grow very large without ever providing payback.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." David Dunham
"Think for yourself!"  Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side."  tachyon
 [reply] 

Memoize won't help for this as for any given set of data, the function will never be called twice with the same set of values.
Ah, yes, I missed that.
I initially thought that the function might be called multiple times with only slight changes to the data tables. [After all, why write a program if it's only going to be used once?] If it's only called once on 1 table of data, you are correct, there's little to gain. And if it's called with more than one data table, but they are largely independent, there's still little gain. It's only if the tables all have lots of overlap that it makes a difference. We could also debate whether recursion this shallow has much to gain, regardless of the undlerlying function or data. We'll have to let the OP comment on his intended use.
All of this reminds me of TheDamian's Tachyonic Variables talk. I won't divulge the secret, but in any case the actual module he developed isn't available yet.
QM

Quantum Mechanics: The dreams stuff is made of
 [reply] 
Re: recursive formula. by jdalbec (Deacon) on Aug 06, 2004 at 01:27 UTC 
r_{1}  0,11  0,43  0,93  0,91  0,52 
r_{2}  0,07  0,31  0,78  0,12  0,18 
r_{3}  0,19  0,37  0,82  0,15  0,32 
Shouldn't 0<r_{1}<r_{2}<r_{3}? Otherwise you're multiplying by negative numbers in the formula.
 [reply] 
Re: recursive formula. by BrowserUk (Pope) on Aug 06, 2004 at 06:02 UTC 
What should the result be for P( 1, 1, 1 )?
My interpretation is that each term will be: P( 1  1 )P( ... )
Which as 1  1 will always be zero, the results of the multiplication is zero, so the overall results will be zero?
If this is the case, then herveus' solution is incorrect as given input of ( 1, 1, 1 ), he returns 1.
If I'm correct, of which there is no guarentee, then I think that the problem lies with his simple expedient of prefixing the list with a 0 to get around the "indices starting from 1" problem, which is causing the function to iterate and recurse once too often.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." David Dunham
"Think for yourself!"  Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side."  tachyon
 [reply] [d/l] [select] 

You forget that r_0 is always zero, so
P(1, 1, 1) = (1  1) P(1, 1) + (1  1) P(1, 1) + (1  0) P(1, 1)
= P(1, 1) = P(1) = 1
Ron Steinke
rsteinke@wlink.net
 [reply] [d/l] 

Thanks. That's the source of my misunderstanding. I saw the r_{0} = 0 reference, but I also saw r running 1 .. n and i running 1 to n.
I missed that ni = 0 whenever n=i. Stupid.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." David Dunham
"Think for yourself!"  Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side."  tachyon
 [reply] 
Re: recursive formula. by BrowserUk (Pope) on Aug 06, 2004 at 11:34 UTC 
I believe this to be a correct implementation.
I've tried to retain the auditability of herveus' implementation by aliasing perl's working variables to names that marry with those used in the formula, whilst avoiding copying lots of arrays.
For the small numbers in the samples it makes little difference, but by the time you get to a dozen or more, the double duplication and the spliceing in a recursive subroutine become a severe resource drain. If the numbers involved are anything like those typical for biogenetic work, they would become untenable quite quickly.
Update: Use the nonaliasing version.
The aliasing version consumes prodigous amount of memory,
This version happily processes
#! perl slw
use strict;
use List::Util qw[ reduce ];
sub P;
sub P{ ## Nonaliasing
warn "@_\n";
return $_[ 0 ] if @_ == 1;
return reduce {
$a += ( $_[$#_  $b + 1]  $_[$#_  $b] )
* P2 @_[ 0 .. $b1, $b+1 .. $#_ ];
} 0, 1 .. $#_;
}
=do not use this version
This version looks pretty and should be logically identical to the abo
+ve and it appears to work okay
for short lists.
But through what I think is a bug in multiplicity Perl
the use of c<local> cause huge memory consumption:
> 750 MB for an input list of 10 items!?
sub P { ## Pretty, aliasing, voracious memory consumer!
return $_[ 0 ] if @_ == 1;
our( @r, $i, $a, $b, $sigma );
local *r = *_;
local *i = *b;
local *sigma = *a;
my $n = $#r;
return reduce{
$sigma += ($r[$n$i+1]  $r[$n$i]) * P @r[0 .. $i1, $i+1 ..
+$n];
} 0, 1 .. $n;
}
=cut
my @samples = (
## r1 r2 r3
[ qw[ 0.11 0.07 0.19 ] ],
[ qw[ 0.43 0.31 0.37 ] ],
[ qw[ 0.93 0.78 0.82 ] ],
[ qw[ 0.91 0.12 0.15 ] ],
[ qw[ 0.52 0.32 0.18 ] ],
[ qw[ 1.0 1.0 1.0 ] ],
[ qw[ 0.5 0.5 0.5 ] ],
[ qw[ 0.0 0.0 0.0 ] ],
[ qw[ 0.19 0.11 0.07 ] ],
[ qw[ 0.43 0.37 0.31 ] ],
[ qw[ 0.93 0.82 0.78 ] ],
[ qw[ 0.91 0.15 0.12 ] ],
);
print "P( @$_ ) = ", P( @$_ ) for @samples;
__END__
P( 0.11 0.07 0.19 ) = 0.001232
P( 0.43 0.31 0.37 ) = 0.004644
P( 0.93 0.78 0.82 ) = 0.016833
P( 0.91 0.12 0.15 ) = 0.547183
P( 0.52 0.32 0.18 ) = 0.045552
P( 1.0 1.0 1.0 ) = 0
P( 0.5 0.5 0.5 ) = 0
P( 0.0 0.0 0.0 ) = 0
P( 0.19 0.11 0.07 ) = 0.002128
P( 0.43 0.37 0.31 ) = 0.004644
P( 0.93 0.82 0.78 ) = 0.016833
P( 0.91 0.15 0.12 ) = 0.547183
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." David Dunham
"Think for yourself!"  Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side."  tachyon
 [reply] [d/l] 

