in reply to
Re^5: Algorithm for cancelling common factors between two lists of multiplicands
in thread Algorithm for cancelling common factors between two lists of multiplicands
I never doubted the idea, nor your proof, I simply had real trouble coding it. There's a load of niggly edge cases that I could not seem to get rightbut now I have.
As a result, the original testcase that took M::BF 4 1/2 hours and that M::Pari couldn't handle, I now have down to 192 milliseconds in pure Perl! What's more, it happily handles a dataset of (1e6 2e6 2e6 1e6) without blowing the memory in just over 1 minute (though I cannot check the accuracy as I have nothing else that will touch those numbers).
Of course, I've given up a fair amount of precision along the way, so now I am wondering if I can enhance my cheap BigFloat code to recover some of it?
#! perl slw
use strict;
use Benchmark::Timer; my $T = new Benchmark::Timer;
use List::Util qw[ sum reduce max ]; our( $a, $b );
sub factors{ 2 .. $_[ 0 ] }
sub normalise {
my( $s, $n ) = @{+shift };
while( 1 ) {
if( $n > 1.0 ) { $n /= 10; $s++; redo; }
elsif( $n < 0.1 ) { $n *= 10; $s; redo; }
else { last }
}
return [ $s, $n ];
}
sub sProduct{
@{
reduce{
$a>[ 0 ] += $b>[ 0 ];
$a>[ 1 ] *= $b>[ 1 ];
normalise( $a );
} @_
}
}
sub sProdRange {
map{
reduce{
$a>[ 1 ] *= $b;
normalise( $a );
} [ 0, 1 ], $_>[ 0 ] .. $_>[ 1 ]
} @_;
}
sub reduceRanges{
our( @v, @s );
local( *v, *s ) = @_;
@v = sort{ $b <=> $a } @v;
@s = sort{ $b <=> $a } @s;
for my $i ( 0 .. max( $#v, $#s ) ) {
$s[$i]=1 unless defined $s[$i];
$v[$i]=1 unless defined $v[$i];
my $cmp = $v[$i] <=> $s[$i];
unless( $cmp ) {
$s[$i] = [ 1, 1 ];
$v[$i] = [ 1, 1 ];
}
elsif( $cmp < 0 ) {
$s[$i] = [ $s[$i]  ( $s[$i]  1  $v[$i] ), $s[$i] ];
$v[$i] = [ 1, 1 ];
}
else {
$v[$i] = [ $v[$i]  ( $v[$i]  1  $s[$i] ), $v[$i] ];
$s[$i] = [ 1, 1 ];
}
}
return \@v, \@s;
}
sub FET6 {
my @data = @_;
return unless @data == 4;
my @C = ( sum( @data[ 0, 2 ] ), sum( @data[ 1, 3 ] ) );
my @R = ( sum( @data[ 0, 1 ] ), sum( @data[ 2, 3 ] ) );
my $N = sum @C;
my( $Vref, $Sref ) = reduceRanges [grep $_, @R, @C], [grep $_, $N,
+ @data];
my( $dScale, $d ) = sProduct sProdRange @$Vref;
my( $sScale, $s ) = sProduct sProdRange @$Sref;
return ( $d / $s, $dScale  $sScale );
}
die "Bad args @ARGV" unless @ARGV == 4;
$T>start("[@ARGV]");
printf "%.17fe%d\n", FET6 @ARGV;
$T>stop("[@ARGV]");
$T>report;
__END__
[ 2:39:39.81] P:\test>FET6 5 0 1 4
0.23809523809523819e1
1 trial of [5 0 1 4] ( 728us total), 728us/trial
[ 2:39:32.26] P:\test>fet6 989 9400 43300 2400
0.80706046478677251e7029
1 trial of [989 9400 43300 2400] (192.499ms total), 192.499ms/trial
[ 2:37:08.39] P:\test>FET6 1e6 2e6 2e6 1e6
2.73353759185676100e147576
1 trial of [1e6 2e6 2e6 1e6] ( 67.960s total), 67.960s/trial
[ 2:39:18.10] P:\test>FET6 1e5 2e5 2e5 1e5
1.32499459649680040e14760
1 trial of [1e5 2e5 2e5 1e5] ( 6.382s total), 6.382s/trial
[ 2:35:53.28] P:\test>FET6 1e6 2e6 2e6 1e6
[1e6 2e6 2e6 1e6]
2.73353759185676100e147576
1 trial of _default ( 67.887s total), 67.887s/trial
Examine what is said, not who speaks  Silence betokens consent  Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.  Rule 1 has a caveat!  Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
Re^7: Algorithm for cancelling common factors between two lists of multiplicands (192.5 ms) by tmoertel (Chaplain) on Aug 13, 2005 at 14:29 UTC 
As a precision reference, here's the 1e5/2e5 answer computed with my program. It ought to be good for 60 digits:
[thor@redwood fishersexacttest]$ cat fetbig.dat
100000 200000
200000 100000
[thor@redwood fishersexacttest]$ time ./fet <fetbig.dat
+1.324994596496999433060120009448386330459655228210366819887079e14760
real 21m8.445s
user 20m56.788s
sys 0m6.831s
It looks like your FET6 yields about 13 digits of precision for this case:

1.32499459649680040e14760
+1.324994596496999433060120009448386330459655228210366819887079e14760

 [reply] [d/l] [select] 
Re^7: Algorithm for cancelling common factors between two lists of multiplicands (192.5 ms) by sk (Curate) on Aug 14, 2005 at 01:03 UTC 
Here is my implementation and results with benchmark. It is 4550% faster. I am not sure what it the exact digit at which the result becomes unreliable so it could be slightly off
My perl skills are not that great so my implementation is very simple and straightforward
tmoertel's haskell implementation amazes me on the precision! I wonder if that is specific to Haskell or the way it was coded
My implementation
 [reply] [d/l] [select] 

tmoertel's haskell implementation amazes me on the precision! I wonder if that is specific to Haskell or the way it was coded
The reason the implementation has such precision is not because it is written in Haskell – although that does make coding more convenient – but because it computes the answer exactly as a ratio of bigints and then extracts decimal digits until the desired precision has been obtained. If you want more or fewer than sixity digits, for example, just change 60 to the desired number in the sciformat 60 call in the code in Re^13: Algorithm for cancelling common factors between two lists of multiplicands.
Cheers, Tom
 [reply] 

I think most (though not all), of your gain is through avoiding the overhead of calling subroutines in loops. Inlining is the last step when trying squeeze out the very last drop of performance. That relates back to my opinion on Perl 5's greatest limitation is...?.
tmoertel's haskell implementation amazes me on the precision!
The precision comes from using Haskell's infinite precision Integer data type (analogous to Math::BigInt) for the product and division calculations, once the cancelling has been carried out.
The performance of that precision comes from the compiled implementation and a highly optimising compiler. The only reason the Perl code manages to beat the compiled Haskell is because it uses the much lower precision, FPU native double representation.
You might find the following code interesting, it's a brute force conversion of Tom's Haskell code staying as close to the original as I could achieve. It works okay for the small dataset, but highlights two major differences between Perl and Haskell.
The highly recusive nature of the algorithm exacerbates the cost of Perl's subcalls and lack of tail recursion.
And if you try to use it on the larger datasets, you'll see that the Perl implementation consumes vast amounts of memory (eventually segfaulting on my machine when it runs out of swap space). Most of the damage is done in building and rebuilding zillions of lists whilst keeping copies of earlier versions on the stack in the merge & cancel functions. It's this type of recursive, divide & copy, list processing that Haskell's ubiquitous lazy operations really excel at.
Not particularly useful given it limitations, but an interesting exercise none the less.
Examine what is said, not who speaks  Silence betokens consent  Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.  Rule 1 has a caveat!  Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
 [reply] [d/l] [select] 

