Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

RFC: Large Floating Point Numbers - Rounding Errors

by GAVollink (Novice)
on Sep 08, 2011 at 14:17 UTC ( #924826=perlmeditation: print w/ replies, xml ) Need Help??

Specifically under Solaris (SPARC), but also noted under most OS versions, floating point numbers are known to be, sometimes, just wrong. All timings below run against a Solaris2.10-sun4 system...

Of course, this is system dependent, and boils down to the OS libstdio and it's implementation of printf. On Solaris (C and Perl), for example:
printf("%.5f\n", 0.000035);
The output is 0.00003, which is a really short number for such a rounding error.
printf("%.5f\n", 0.099994999999999999);
Outputs 0.1000, but, with such a long number, it blows past half precision anyway, so this isn't as important... unless it is.

So, there Math::BigFloat, which is awesome, if you think turtles crawl quickly. I'm looking at high volume number crunching here, BigFloat takes over 30 seconds per 100,000 numbers, and that is just calling $bf->ffround(), another chunk of time creating the heavily overloaded objects.

So, there's the tried and usually true exponential way of rounding it. $o = $num**5; $p = int($o); (if these differ by more than 0.5, then { $o++; } return $o/10**5;). This method fixes 0.000035 rounding down issue. However, this has that same problem with numbers that break half-precision. (This method does 100,000 numbers in 2.39 seconds).

So, I wrote something that will do the rounding on the string representation of the number, instead of using math operations.

Counter intuitive as hell, but IF the number does not string to scientific notation, 100,000 in 1.78 seconds, otherwise 2.40 seconds. Which is to say, at least on Solaris, walking the string is as fast or faster than using exponential for rounding. Also, as long as a number is protected by quotes, there is no limit to the possible precision.

I would NOT recommend using this. I've tested it extensively, but - deeply - if you think you need this, then something else is probably wrong.

sub string_round { my $num = shift; my $pre = shift; my $dot = index($num, q{.}); my $_e_ = index($num, q{e}); if ( $[ <= $_e_ ) { my $exp = substr($num, $_e_+1); # Capture exp portion. $num = substr($num, 0, $_e_); # Remove the exp portion. if ( $[ > $dot ) { $dot = $_e_; # Has exp, no dot, set it to where the e was } else { # Remove the dot $num = ( substr($num, 0, $dot) . substr($num, $dot+1) ); } if (0!=$exp && ( -1 == ($exp/abs($exp)))) { # Negative expone +nt. # Numbers available (from $[ to $dot) # are within precison, after exponent is applied (exp - $ +dot) if ( (abs($exp)-$dot+$[) <= $pre ) { for ( my $cx = 0; $cx < abs($exp); $cx++ ) { $dot--; if ( $dot < $[ ) { $num = "0$num"; $dot++; } } } elsif ( abs($exp) > abs($dot-$pre+$[) ) { return 0; } else { for ( my $cx = 0; $cx < (abs($exp)-$dot+$[); $cx++ ) { $dot--; } } if ( $[ < $dot ) { $num = ( substr($num, 0, $[+$dot) . "." . substr($num, $[+$dot ) ); } else { $num = "0.$num"; $dot = $[ + 1; } } else { # Positive exponent. my $cx = 0; # Dot already removed from number # If $cx is less than exponent, walk along the number # (starting from where . was). for ( ; $cx < $exp; $cx++ ) { # If , somehow, $[ + $dot + $cx # ... is beyond the length of the string # cath up by adding more zeroes. while (( $[ + $dot + $cx ) >= length($num) ) { $num .= "0"; } } # Maybe I didn't add any zeroes. # $[ + $dot + $cx is still shorter then length # ... so, the number still has a dot, # and maybe still needs rounding. if (( $[ + $dot + $cx ) < (length($num)) ) { $dot = ( $dot + $exp ); $num = ( substr($num, 0, $[+$dot) . "." . substr($num, $[+$dot ) ); } else { # Already at end of string, just return the whole numb +er. $dot = $[ - 1; return $num; } } } # End $_e_ ( exponent has been eliminated ) if ( $[ <= $dot ) { my $aPre = $[ + $dot + $pre; if ( length($num) >= $aPre ) { my $aCho = $aPre + 1; my $cho = substr($num, $aCho, 1); $num = substr($num, 0, $aCho); # Cut off num at expected l +ength. if ( 5 <= $cho ) { SREV: for ( my $cx = $aPre; 0 <= $cx; $cx-- ) { my $noch = substr($num, $cx, 1); if (0==$cx) { if ( q{.} eq $noch) { $num = "1" . $num; } elsif ( q{9} eq $noch) { substr($num, $cx, 1) = "0"; $num = "1" . $num; } else { substr($num, $cx, 1) = ($noch + 1); } last SREV; } elsif ( q{.} eq $noch ) { next SREV; } elsif ( 9 != $noch ) { substr($num, $cx, 1) = ($noch + 1); last SREV; } else { substr($num, $cx, 1) = "0"; } } } } } return $num; }

Yeah, substr abound, and yet, it's really quite fast. I tried it by converting the whole number into an array of characters first, and it was twice as slow.

Comment on RFC: Large Floating Point Numbers - Rounding Errors
Download Code
Re: RFC: Large Floating Point Numbers - Rounding Errors
by Anonymous Monk on Sep 08, 2011 at 14:19 UTC
Re: RFC: Large Floating Point Numbers - Rounding Errors
by BrowserUk (Pope) on Sep 08, 2011 at 14:30 UTC
    This method fixes 0.000035 rounding down issue.

    What issue do you have with 0.000035 rounding to 0.00003 when you've explicitly asked for 5 dp?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      It is not consistent:

      0.000005 up 0.00001
      0.000015 up 0.00002
      0.000025 up 0.00003
      0.000035 dn 0.00003
      0.000045 up 0.00005
      0.000055 up 0.00006
      0.000065 dn 0.00006
      0.000075 dn 0.00007
      0.000085 up 0.00009
      0.000095 up 0.00010

        Very strange. I'd almost say that you'd discovered a bug in perl or the underlying C-libraries that was causing the double precision (64-bit) NV value to be transitioned through a single precision (32-bit) float whilst being formatted.

        If the values were always being manipulated as doubles, this shouldn't happen.

        It could be a that the IEEE rounding mode isn't being set (correctly?).


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: RFC: Large Floating Point Numbers - Rounding Errors
by salva (Monsignor) on Sep 08, 2011 at 15:57 UTC
    SPARC has support for a quadruple precision floating-point format (though, actually not implemented in hardware but emulated by software) that can be used from perl enabling the long-double flag at compilation time.

    Though, as it is described in the documented linked by the AnonMonk above, rounding errors will still happen because the sets of numbers that can be represented in base 2 floats and in base 10 floats are different.

    Also, it should be possible to develop a C/XS math library for fixed-length float-point with base 10 exponent representation of numbers that would be faster than Math::BigFloat that uses a variable length representation. For your particular case, a fixed-point representation may also work.

Re: RFC: Large Floating Point Numbers - Rounding Errors (Rounding mode)
by BrowserUk (Pope) on Sep 08, 2011 at 16:12 UTC

    The problem is that your perl or C-library is setting (or leaving) the IEEE rounding mode set to RC_CHOP (truncate). Any of the other three modes -- UP, DOWN, OR NEAREST -- would fix this.

    The following shows the output on my intel system for teh four modes of IEEE, FP operations:

    #! perl -slw use strict; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'END_C', NAME => 'ieeroundmode', CLEAN_AFTER_BUILD +=> 0; unsigned int controlFP( unsigned int new, unsigned int mask ) { return _controlfp( new, mask ); } END_C use constant { RC_MASK => 0x00000300, RC_CHOP => 0x00000300, RC_UP => 0x00000200, RC_DOWN => 0x00000100, RC_NEAR => 0x00000000, }; print "Setting mode CHOP; controlFP returned: ", controlFP( RC_CHOP, R +C_MASK ); for( my $n = 0.000005; $n < 0.0001; $n += 0.00001 ) { printf "%.15f %.5f\n", $n, $n; } print "Setting mode UP; controlFP returned: ", controlFP( RC_UP, RC_MA +SK ); for( my $n = 0.000005; $n < 0.0001; $n += 0.00001 ) { printf "%.15f %.5f\n", $n, $n; } print "Setting mode DOWN; controlFP returned: ", controlFP( RC_DOWN, R +C_MASK ); for( my $n = 0.000005; $n < 0.0001; $n += 0.00001 ) { printf "%.15f %.5f\n", $n, $n; } print "Setting mode NEAR; controlFP returned: ", controlFP( RC_NEAR, R +C_MASK ); for( my $n = 0.000005; $n < 0.0001; $n += 0.00001 ) { printf "%.15f %.5f\n", $n, $n; } __END__ C:\test>ieeroundmode.pl Setting mode CHOP; controlFP returned: 525087 0.000005000000000 0.00001 0.000015000000000 0.00002 0.000025000000000 0.00003 0.000035000000000 0.00003 0.000045000000000 0.00004 0.000055000000000 0.00005 0.000065000000000 0.00006 0.000075000000000 0.00007 0.000085000000000 0.00008 0.000095000000000 0.00009 Setting mode UP; controlFP returned: 524831 0.000005000000000 0.00001 0.000015000000000 0.00002 0.000025000000000 0.00003 0.000035000000000 0.00004 0.000045000000000 0.00005 0.000055000000000 0.00006 0.000065000000000 0.00007 0.000075000000000 0.00008 0.000085000000000 0.00009 0.000095000000000 0.00010 Setting mode DOWN; controlFP returned: 524575 0.000005000000000 0.00001 0.000015000000000 0.00002 0.000025000000000 0.00003 0.000035000000000 0.00003 0.000045000000000 0.00004 0.000055000000000 0.00005 0.000065000000000 0.00006 0.000075000000000 0.00007 0.000085000000000 0.00008 0.000095000000000 0.00009 Setting mode NEAR; controlFP returned: 524319 0.000005000000000 0.00001 0.000015000000000 0.00002 0.000025000000000 0.00003 0.000035000000000 0.00004 0.000045000000000 0.00005 0.000055000000000 0.00006 0.000065000000000 0.00007 0.000075000000000 0.00008 0.000085000000000 0.00009 0.000095000000000 0.00010

    Maybe you can set the rounding mode yourself in a similar way?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Nice try. But...

      No. It isn't.

      YOU are wrong! Wrong! Wrong!

      It is just a consequence of switching between base 10 and 2:

      #!/usr/bin/perl -wl use strict; for my $n ( 0.000005, 0.000015, 0.000025, 0.000035, 0.000045, 0.000055, 0.000065, 0.000075, 0.000085, 0.000095, ) { printf "%.23f\n", $n; } __END__ 0.00000500000000000000041 0.00001500000000000000038 0.00002500000000000000120 0.00003499999999999999693 0.00004500000000000000283 0.00005500000000000000196 0.00006499999999999999431 0.00007499999999999999343 0.00008500000000000000611 0.00009500000000000000523

      Your assumption that 0.000035 is exactly the same as 0.000005 + 3*0.000010 was not correct.

      Update: Please show the results for explicitly enumerated values and show more precision on the first value. Or I'll to do that myself a bit later when I will probably again have convenient access to a system with Inline::C installed on it and with _controlfp().

      - tye        

      Well, you got rather quiet on the subject but I wasn't convinced that there might not be some truth to the "rounding mode" stuff. But it turns out that there was no truth to it at all for this case.

      The rounding mode is already at NEAR and the rounding mode makes no difference in the behavior of sprintf()'s rounding. The reason you got results more to your liking is purely due to computing 0.000035 using repeated additions rather than using the constant 0.000035 directly.

      Here is how I modified your test code to prove this:

      And here is summarized output, collapsing identical sections and highlighting the differences between the different parts:

      Default / NEAR (compare computed value with cons +tant) 0.0000050000000000000004 0.00001 =0.0000050000000000000004= =0.000 +01= 0.0000150000000000000000 0.00002 <0.0000150000000000000020> =0.000 +02= 0.0000250000000000000010 0.00003 <0.0000250000000000000050> =0.000 +03= 0.0000349999999999999970 0.00003 <0.0000350000000000000040> <0.000 +04> 0.0000450000000000000030 0.00005 =0.0000450000000000000030= =0.000 +05= 0.0000550000000000000020 0.00006 =0.0000550000000000000020= =0.000 +06= 0.0000649999999999999940 0.00006 <0.0000650000000000000080> <0.000 +07> 0.0000749999999999999930 0.00007 <0.0000750000000000000070> <0.000 +08> 0.0000850000000000000060 0.00009 =0.0000850000000000000060= =0.000 +09= 0.0000950000000000000050 0.00010 =0.0000950000000000000050= =0.000 +10= CHOP / DOWN (compare value to Default / NEAR case) >0.0000049999999999999996< >0.00000< >0.0000049999999999999996< >0.000 +00< >0.0000149999999999999990< >0.00001< >0.0000150000000000000000< =0.000 +02= >0.0000249999999999999980< >0.00002< >0.0000250000000000000010< =0.000 +03= =0.0000349999999999999970= =0.00003= >0.0000349999999999999970< >0.000 +03< >0.0000449999999999999960< >0.00004< >0.0000449999999999999960< >0.000 +04< >0.0000549999999999999950< >0.00005< >0.0000549999999999999950< >0.000 +05< >0.0000649999999999999940< >0.00006< >0.0000649999999999999940< >0.000 +06< =0.0000749999999999999930= >0.00007< >0.0000749999999999999930< >0.000 +07< >0.0000849999999999999930< >0.00008< >0.0000849999999999999930< >0.000 +08< >0.0000949999999999999920< >0.00009< >0.0000949999999999999920< >0.000 +09< UP (compare value to Default / NEAR case) =0.0000050000000000000004= =0.00001= =0.0000050000000000000004= =0.000 +01= =0.0000150000000000000000= =0.00002= =0.0000150000000000000020= =0.000 +02= =0.0000250000000000000010= =0.00003= =0.0000250000000000000050> =0.000 +03= <0.0000350000000000000040> <0.00004> <0.0000350000000000000100> =0.000 +04= =0.0000450000000000000030= =0.00005= <0.0000450000000000000160> =0.000 +05= =0.0000550000000000000020= =0.00006= <0.0000550000000000000220> =0.000 +06= <0.0000650000000000000080> <0.00007> <0.0000650000000000000350> =0.000 +07= <0.0000750000000000000070> <0.00008> <0.0000750000000000000480> =0.000 +08= =0.0000850000000000000060= =0.00009= <0.0000850000000000000600> =0.000 +09= =0.0000950000000000000050= =0.00010= <0.0000950000000000000730> =0.000 +10=

      Setting rounding mode to NEAR gives the exact same results as I get before changing the rounding mode. Setting it to UP gives the desired results (for my code) not because it changes how sprintf rounds but because it changes how Perl converts the string '0.000035' into a floating point value, ensuring that the result is always slightly higher than (or equal to) the value that the string represents. But using UP also introduces growing inaccuracy in the additions.

      The "better" results in your NEAR case were only the luck of 0.000005 and 0.000010 converting to floating point values slightly larger than the numbers those strings represent.

      Also note that your results would be slightly different than mine for the DOWN / CHOP cases, since you put the constant 0.000005 in as a number so that it was interpreted at compile time (and thus using the NEAR rounding mode) rather than the way I used strings to force run-time interpretation of the constants using the different round modes (except for 0.000010).

      - tye        

        Well, you got rather quiet on the subject

        When the discussion turns from attempting to solve the OPs problem, to either a pissing contest or a witch hunt, participating further is of little purpose.

        The reason you got results more to your liking is purely due to computing 0.000035 using repeated additions rather than using the constant 0.000035 directly.

        So, what you are saying is, my "mistake" (or "cheat") was that I used numbers for (shock horror!) computation.

        Obviously, the affect of the float point processors rounding mode will only have a significant affect if you actually perform some -- ta-da! -- floating point calculations.

        I guess you could term sprintf "%.5f", '0.000035'; a calculation, but given that '0.00004'; is just easier and clearer, you have to wonder at the purpose.

        Rounding (and the effects of rounding mode upon it) only becomes meaningful when outputting the results of a computation. And it isn't just addition, it is all operations.

        There are an infinite number of calculations, that in an ideal world would produce exactly 0.000035. Some of them will round out one way, and some the other:

        printf "%.23f : %.5f\n", $_, $_ for 0.000005 * 7;; 0.00003500000000000000400 : 0.00004 printf "%.23f : %.5f\n", $_, $_ for 0.000007 * 5;; 0.00003499999999999999700 : 0.00003

        The truncation of a 6 sig.fig constant to a 5 sig.fig. string is the least interesting -- read: pointless -- of them all.

        And using the appropriate rounding mode, not just for the final calculation, but also all intermediate terms becomes very important as the number and complexity of the operations involved increases.

        I stick by my statement that a 64-bit IEEE 754 floating point value is perfectly capable of storing 0.000035 to sufficient accuracy that it can be rounded correctly, for any mathematically significant purpose.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: RFC: Large Floating Point Numbers - Rounding Errors
by jwkrahn (Monsignor) on Sep 08, 2011 at 20:06 UTC
    Yeah, substr abound, and yet, it's really quite fast.

    You can speed up some of the substr usage by using four arguments instead of using substr as an lvalue.    For example: substr($num, $cx, 1, $noch + 1) instead of substr($num, $cx, 1) = ($noch + 1).

      I like this idea.

      Thanks!

Re: RFC: Large Floating Point Numbers - Rounding Errors
by salva (Monsignor) on Sep 09, 2011 at 06:58 UTC
    This seems to work (on x86, at least):
    printf("%.5f\n", sprintf("%.6f1", $_)) for (0.000005, 0.000015, 0.000025, 0.000035, 0.000045, 0.000055, 0 +.000065)
      printf("%.­5f\n", sprintf("%­.6f1", $_))

      Sure. But that is worse in a lot of ways as well. What does it do to 0.0000346 ?

      But you can use that approach to at least very closely match (some) human expectations. It is mostly just your choice of "6" (or $n+1) that is at fault. A fairly reasonable thing to do would be more like:

      printf( "%.5f\n", sprintf("%.12e",$_) ) # ^^^

      Find an example that violates your expectations for that. I'll wait...

      The obvious one is, of course, 0.000034999999999999. But try this:

      % perl -le "print 0.00003499999999999999" 3.5e-05

      The above number is so very close to 0.000035 that Perl itself intentionally decides to display it as being exactly 0.000035. So, if that many '9's on the end doesn't violate your personal sense of perspective such that you don't mind that it gets rounded up to 0.00004, then I claim that it is your perspective when it comes to mundane computer performance that is to blame. It means that you expect your lowly computer to be able to weigh a large beach (one with 1000 volleyball courts) and not be off by a single grain of sand.

      Perl has this concept of "so close as to be displayed as equal" because without it you get things like sum( (0.01)x50 ) showing not as 0.5 but as 0.50000000000000022. So it is good, even important, to ignore the last several bits of floating point values when displaying them. (It is also good to do so when comparing them, but I'll just mention that can of worms without opening it.)

      So, why don't we just fix Perl's sprintf so 0.000034999999999999 rounds to 0.00004 ? Wait! Why should exactly 0.000035 (not floating point) round to 0.00004 ? It is exactly equally between 0.00003 and 0.00004. So either answer is equally appropriate. So always rounding such values up induces a small imbalance. That is why there is the superior "banker's rounding" where half of such values round up and half of such values round down. So, why don't we just fix Perl's sprintf so it uses banker's rounding?

      We don't need to:

      my %c; for my $i ( '00' .. '99' ) { my $f= 0 + ".${i}5"; my $r= sprintf "%.2f", $f; my $d= $r < $f ? '-' : '+'; $c{$d}++; print "$f $r $d\n"; } print "$c{'-'} rounded down, $c{'+'} rounded up.\n"; __END__ 0.005 0.01 + 0.015 0.01 - 0.025 0.03 + 0.035 0.04 + 0.045 0.04 - 0.055 0.06 + ... 0.905 0.91 + 0.915 0.92 + 0.925 0.93 + 0.935 0.94 + 0.945 0.94 - 0.955 0.95 - 0.965 0.96 - 0.975 0.97 - 0.985 0.98 - 0.995 0.99 - 50 rounded down, 50 rounded up.

      So the real problem here is some people's naive expectations left over from 4th-grade math class, not Perl's more accurate and practical floating point behavior that it inherits from C.

      Banker's rounding is fine when doing calculations with pencil and paper. Let's not slow down Perl to conform to naive or out-dated human expectations.

      P.S. Yes, I know that 0.000034999999999999 is not the same as 0.00003499­9999999999­99. If you didn't immediately notice the discrepancy, then I'm not surprised. The difference was intentional and, in the end, the similarity between the two seems so obvious that I decided to not even comment on it beyond this postscript.

      Update: The printf( "%.5f\n", sprintf("%­.12e",$_) ) code neglects to insert the '1' that compensates for numbers sometimes being converted to base-2 values ever so slightly smaller than the number that the string represents. s/e/1e/i would be a reasonable way to insert that '1' but that doesn't fit easily into a simple, single-expression example.

      - tye        

        I'll take your rant and raise you one...

        Flippant disregard for naïveté may be geek-chic, but I live in the real world. That is, while you really do make a valid point, those fourth grade expectations are common, and are consistent in an easily observable way. If I went and told my users that their expectations are merely naive, I would lose.

        ... shirt, respect, credibility and eventually job.

        The best answer noted elsewhere in this thread, is a hardware limitation. The SPARC architecture does not have a way to set the rounding mode for floating point operations. However, x86 Operating Systems (including recent MacOS, but still not Solaris) seems to have a way to reach into the floating point hardware to ask for a NEAR rounding mode. I can sell broken architecture as an excuse (since it is demonstrably true) far better than I can sell broken user expectations (which are not demonstrable at all).

        Or, I can cheat, use string_round(), and look like I got a bug fixed where nobody else could do it efficiently. I don't mind cheating, as long as it doesn't cause any real harm. This is such a case, since as BrowserUK dismissively notes above, all rounding is cosmetic anyway.

Re: RFC: Large Floating Point Numbers - Rounding Errors
by baruch60610 (Initiate) on Sep 19, 2011 at 03:14 UTC

    I'm thinking that it might make sense to use Math::BigRat for the calculations and only convert to "float" for display purposes (do your own output routines). I've always found floating point to be treacherous, and have gotten burned more than once by unexpected behavior and rounding errors (and, at first, naive comparisons with zero).

    I haven't done a comparison of how fast BigRat works compared to BigFloat, so it may not be worth the effort after all.

      I haven't done a comparison of how fast BigRat works compared to BigFloat, so it may not be worth the effort after all.

      If speed is ever an issue, then the gmp library provides "bigrat" (as well as both "bigfloat" and "bigint") types.
      The "bigrat" arithmetic is accessible from perl via Math::GMPq

      Cheers,
      Rob
        Ah, so little time. If I didn't have to compile this first, I'd have already posted timing differentials.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://924826]
Approved by BrowserUk
Front-paged by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (9)
As of 2014-07-25 13:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (171 votes), past polls