Your skill will accomplishwhat the force of many cannot PerlMonks

### Calculation discrepancy between Perl versions

by oko1 (Deacon)
 on Oct 05, 2010 at 03:54 UTC Need Help??
oko1 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, all: I'm getting a disturbingly-large discrepancy in running an identical script on different machines, and hoping that someone can suggest a cause and perhaps a fix. I'm suspecting a possible bug in my perl version.

Code being executed (@ARGV = qw/data.txt 636/):

#!/usr/bin/perl -w
use strict;
# Least square fit to quadratic function; input up to ~1k data pairs a
+s space-separated columns.
#
# $Revision: 1.1$ $Date: 2010-06-28 17:31:23-04$
# $Source: /home/vogelke/notebook/2010/0628/polynomial-fit/RCS/pfit,v +$
# $UUID: 631e046d-f070-38bb-b287-bdd10b1d0efa$
#
# John Lapeyre Tue Jan 28 18:45:19 MST 2003
# lapeyre@physics.arizona.edu
# http://www.johnlapeyre.com/computers.html
# http://www.johnlapeyre.com/pfit
#
# $Revision: 1.2$ $Date: 2010-10-04 17:08:40-05$
# Revised by Ben Okopnik
# Added usage message, code strictures, input filtering/validation
#
# fit to A x^2  + B x + C
# Works by solving matrix equation M.c = v.
# c = (A,B,C)
# v = (v1,v2,v3), where vn = \sum_i y_i x_i^(n-1)
# m = ( (m4,m3,m2), (m3,m2,m1), (m2,m1,m0))
# where mn = \sum_i x_i^n

die "Usage: ", $0 =~ /([^\/]+)$/, " <input_file> <target_number>\n"
unless @ARGV == 2;
my $tgt = pop; my ($m0, $v1,$v2, $v3,$m1, $m2,$m3, $m4,$d, @x, @y) = 0;

while (<>) {
next unless /^\s*(\d+)(?:$|\s+(\d+)\s*$)/;
push @y, $+; push @x, defined$2 ? $1 :$m0;
$m0++; } die "Badly-formatted data (one or two numerical columns required)\n" unless$m0;

for (0 .. $#x) { my$x = $x[$_];
my $y =$y[$_];$v1 += $y *$x**2;
$v2 +=$y * $x;$v3 += $y;$m1 += $x;$m2 += $x *$x;
$m3 +=$x**3;
$m4 +=$x**4;
}

# Used mathematica to invert the matrix and translated to perl
$d =$m2**3 + $m0 *$m3**2 + $m1**2 *$m4 - $m2 * (2 *$m1 * $m3 +$m0
+ * $m4); my$A = ($m1**2 *$v1 - $m0 *$m2 * $v1 +$m0 * $m3 *$v2 + $m2**2 *$
+v3 -
$m1 * ($m2 * $v2 +$m3 * $v3)) /$d;
my $B = (-($m1 * $m2 *$v1) + $m0 *$m3 * $v1 +$m2**2 * $v2 -$m0 * $+m4 *$v2 -
$m2 *$m3 * $v3 +$m1 * $m4 *$v3) / $d; my$C = ($m2**2 *$v1 - $m1 *$m3 * $v1 +$m1 * $m4 *$v2 + $m3**2 *$
+v3 -
$m2 * ($m3 * $v2 +$m4 * $v3)) /$d;

# A x^2  + B x + C
printf "%f\t%f\t%f\n", $A,$B, $C; # Correct$C for final value
$C -=$tgt;

# PE solver
my $y1 = (-$B + sqrt($B**2 - 4 *$A * $C)) / (2 *$A);
my $y2 = (-$B - sqrt($B**2 - 4 *$A * $C)) / (2 *$A);
printf "Roots: %.3f or %.3f\n\n", $y1,$y2;
[download]

The data being processed (CO2 measurements at Mauna Loa):

1962 318
1982 341
2002 373
[download]

The results:

# i686 running Linux with perl-5.10.0
0.011256        -43.244877      41833.442623
Roots: 2093.870 or 1747.905

# IBM x3400 running Linux with perl-5.10.1
0.011250        -43.220086      41809.478563
Roots: 2083.912 or 1757.866

# Sparc running Solaris-10 with perl-5.10.1
0.011250        -43.220086      41809.478563
Roots: 2083.912 or 1757.866

# After adding "use bignum;" on either of the last two
0.011250        -43.220000      41809.395000
Roots: 2083.912 or 1757.866
[download]

Any help or suggestions would be appreciated.

--
"Language shapes the way we think, and determines what we can think about."
-- B. L. Whorf

Replies are listed 'Best First'.
Re: Calculation discrepancy between Perl versions
by ig (Vicar) on Oct 05, 2010 at 05:22 UTC

I suspect the differences result from differences in how floating point is handled on the different systems affecting the cumulated error from the sequence of calculations.

You could try re-implementing using Math::BigFloat and setting a sufficiently large (or is that small?) precision so that the cumulated error is sufficiently small.

If my guess is correct, you might find What Every Computer Scientist Should Know About Floating-Point Arithmetic helpful.

Re: Calculation discrepancy between Perl versions
by GrandFather (Sage) on Oct 05, 2010 at 06:34 UTC

Following on from ig's comment a couple of alarm bell stimulating items in relation to your code, data and results should be mentioned:

1. sums and differences of floats can easily lead to large errors if not managed correctly, especially in the context of exponentiation
2. output values with more significant digits than seem reasonable given the input data is a cause for concern

This sort of code can be very tricky to get right with careful attention needing to be paid to expected ranges of values at each step and consideration given to changing calculation order to minimise errors propagating through the calculation chain. Using higher precision numerical representation is a quick way around what can otherwise often be a difficult problem.

True laziness is hard work
Re: Calculation discrepancy between Perl versions
by BrowserUk (Pope) on Oct 05, 2010 at 13:22 UTC

FWIW: I get the same results on I686 Windows, as you get on I686 linux, which suggests to me that the difference if down to differences in the floating point hardware rather either the way Perl is built, or the underlying CRT math functions.

I thought for a while that it might be down to the IEEE rounding mode is use, but I tried it with all 4 modes and whilst the results do vary, the differences are far less than you are seeing:

C:\test>junk54 MaunaLoa 636 1
0.011256        -43.244877      41833.442623
Roots: 2093.870 or 1747.905

C:\test>junk54 MaunaLoa 636 2
0.011256        -43.245005      41833.967213
Roots: 2093.804 or 1747.982

C:\test>junk54 MaunaLoa 636 3
0.011256        -43.245005      41833.442623
Roots: 2093.939 or 1747.847

C:\test>junk54 MaunaLoa 636 4
0.011256        -43.245005      41833.967213
Roots: 2093.804 or 1747.982
[download]

If I add use bignum; I get the same result regardless of rounding mode (as you'd expect with infinite precision):

C:\test>junk54 MaunaLoa 636 0
0.011250        -43.220000      41809.395000
Roots: 2093.969 or 1747.809
[download]

But the result is still not far from the standard double precision results, and far away from your results on those other platforms.

So then I decided to "check the math".

I fed your sample data into wolfram/alpha's quadratic fit and got a = 0.01125 b=-43.33 c=41809.39500009608 (your results are close enough!)

I then walked manually through the calculation:

The small results are done using calc.exe, the extended numbers in brackets are produced using wolfram alpha.

All of which suggests that your I686 results are correct, and the "bug" is on the other platforms.

Perhaps they're using single precision? Or at least, perhaps lookup tables for sqrt() &| pow() that are only single precision?

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Thanks, all, for the very useful responses (I'm responding to BrowserUk's reply specifically because it's so detailed and helpful in so many aspects) - this was very useful for both confirmation and more direction in deciding where to look for the error. I'm not all that familiar with the guts of Perl, but it's looking likely that there's a major compilation difference responsible here: either single-precision lookup tables, or perhaps a radically different math lib. Annoying that something like that could affect a Perl program... but on the other hand, good to learn that it can.

Again, thank you all very much.

--
"Language shapes the way we think, and determines what we can think about."
-- B. L. Whorf
either single-precision lookup tables, or perhaps a radically different math lib. Annoying that something like that could affect a Perl program...

I'm very unsure of my ground here as I'm not familiar with the other platforms, but it might not be software--Perl or the libs--but the floating point hardware. If their FPU's are only single precision, that might account for the results.

That said, my best efforts to perform the calculation in single precision doesn't get close to the inaccuracies you;re seeing:

#include <stdio.h>

float sqrtf( float n ) {
float g = n / (float)2.0;
while( ( ( g*g ) - n ) > 0.000001 ) {
g = ( g + n/g ) / (float)2.0;
}
return g;
}

void main( void ) {
float a = (float)0.01125, b = (float)-43.22, c = (float)41173.3950
+0009608;
float two = 2.0, four = 4.0;
float b2 = b * b;
float ac4 = four * a * c;
float sqrtbit = sqrtf( b2 - ac4 );
float a2 = two * a;
float r1 = ( -b + sqrtbit ) / a2;
float r2 = ( -b - sqrtbit ) / a2;

printf( "b2:%f ac4:%f sqrt:%f a2:%f\n", b2, ac4, sqrtbit, a2 );

printf( "roots: %f , %f\n", r1, r2 );
}

[download]

Produces:

C:\test>quads
b2:1867.968506 ac4:1852.802856 sqrt:3.894310 a2:0.022500
roots: 2093.969238 , 1747.808472
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Calculation discrepancy between Perl versions
by JavaFan (Canon) on Oct 05, 2010 at 13:26 UTC
It looks to me all of your @x, @y, $v?s,$m?s and $d are integers.$A, $B, and$C are all integers divided by the same number ($d). You should be able to rewrite your calculations so you factor out dividing by$d - if I'm not mistaken, skipping the three divisions by $d, and replacing the adjustment of$C by doing $C -=$tgt * $d; should do this. This reduces the amount of floating point calculations, and hence the rounding errors. Of course, there still may be floating point calculations if any of intermediate integers becomes "too large". (Print out$A, $B,$C and \$d to make sure).

Re: Calculation discrepancy between Perl versions
by Anonymous Monk on Oct 05, 2010 at 11:51 UTC
An alternative to running with more significant digits, is to rewrite your code to make use of Number::WithError. This module knows how to propagate error through most common kinds of operations. This module will work with BigFloat numbers as well.
Re: Calculation discrepancy between Perl versions
by sundialsvc4 (Abbot) on Oct 06, 2010 at 17:50 UTC

The adage that I heard was:   “Floating point numbers like piles of dirt on a beach.   Every time you pick one up and move it around, you lose a little dirt and you pick up a little sand.”

Every implementation ought to produce the same answer, within the useful number of significant-digits, for most calculations.   But, the more calculations you do (and depending on exactly how you do it), the more the results will “drift” toward utter nonsense.

And I truly think that you should expect this from any binary floating-point implementation.   There are two classic ways that applications (such as, accounting applications in particular) counter this:

• Binary-Coded Decimal (BCD):   The calculations truly are performed in decimal mode, using 4 bits per decimal integer.   (COBOL turned this into a science.)
• Scaled Integers:   The calculations are performed using integer arithmetic, and the result is understood to be “multiplied by (say...) 10,000,” giving you a fixed precision of (say...) 4 digits to the right of the decimal.   (Microsoft Access uses this strategy in their “Currency” data-type.)

Even so, errors can accumulate.   This can be further addressed by algorithms such as “banker’s rounding.”   There is, of course, the (probably apocryphal) tale of an intrepid computer-programmer who found a way to scoop all of those minuscule rounding-errors into his own bank account...

Float-binary can never be a “pure” data representation.   It is well-understood that the fraction 1/3 cannot be precisely expressed as a decimal number.   Similar artifacts occur for other fractions in other bases, and, so they tell me, for base-2 floats, one of those unfortunate numbers is 1/10.   (“So I have been told.”   I don’t have enough geek-knowledge to actually know for sure...)

Create A New User
Node Status?
node history
Node Type: perlquestion [id://863499]
Approved by planetscape
Front-paged by planetscape
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2017-08-20 02:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Who is your favorite scientist and why?

Results (313 votes). Check out past polls.

Notices?