Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

When < isn't less than

by inelukii (Sexton)
on Sep 11, 2003 at 18:45 UTC ( #290777=perlquestion: print w/ replies, xml ) Need Help??
inelukii has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I have inherited some code that does what would appear to be a simple binning operation. However, I have discovered an error in the binning but for the life of me don't see what's wrong with the code.

Given a set of data between 0.9 and 1, the code should place it in 12 different bins.
Bin 0 is < 0.9,
Bin 1 >= 0.9 && < 0.91,
Bin 2 >= 0.91 && < 0.92,
etc..
Bin 11 >= 1.0

When run, some values get shifted down, others are placed in appropriate bins. So far, I am concerned with the < operator because it shows for example, that 0.99 is < 0.99. I hope there is a stupid mistake that I am just missing; I'd appreciate any assistance. Whether the binning algorithm is efficient or not, I'm not currently concerned with, first and foremost is that it work.

Here's the code...

#!/usr/bin/perl use strict; my %final_data = ( '1' => 1.000, '2' => 0.990, '3' => 0.980, '4' => 0.970, '5' => 0.960, '6' => 0.950, '7' => 0.940, '8' => 0.930, '9' => 0.920, '10' => 0.910, '11' => 0.900, '12' => 0.890, ); my $min = 0.90; my $max = 1.00; my $low; my $high; my $incr = 0.01; DATA_ITEM: for my $key ( sort { $a <=> $b } keys %final_data ) { $low = $min; $high = $min + $incr; for my $bin ( 1 .. 10 ) { if( $final_data{$key} < $min ) { warn "$final_data{$key} fell in bin 0 ( $final_data{$key} < $min +)\n"; $low = $high; $high += $incr; next DATA_ITEM; } elsif( $final_data{$key} >= $max ) { warn "$final_data{$key} fell in bin 11 ( $final_data{$key} >= $ma +x )\n"; $low = $high; $high += $incr; next DATA_ITEM; } elsif( ($final_data{$key} >= $low) && ($final_data{$key} < $high +) ) { warn "$final_data{$key} fell in bin $bin ( $final_data{$key} >= $ +low && $final_data{$key} < $high )\n"; $low = $high; $high += $incr; next DATA_ITEM; } $low = $high; $high += $incr; } }

Inelukii

Comment on When < isn't less than
Download Code
Re: When < isn't less than
by dws (Chancellor) on Sep 11, 2003 at 19:07 UTC
    When run, some values get shifted down, others are placed in appropriate bins. So far, I am concerned with the < operator because it shows for example, that 0.99 is < 0.99.

    What it is actually showing is that

    (0.9 + 0.01 + 0.01 + ... + 0.01) < 0.99
    Welcome to the wild, wacky world of imprecise floating point representation. The problem, or one of them, is that the numbers you're using (other than 1.0) don't have precise counterparts in the internal floating point representation that chips use to represent real numbers. When you start adding them, the imprecision gets more noticeable.

    If you're concerned, you might be able to scale your data up by 100x, then scale back when its time to display the buckets.

Re: When < isn't less than
by sutch (Curate) on Sep 11, 2003 at 19:12 UTC
    Your issue has to do with the way that numbers are represented by Perl. An article that is helpful in understanding this can be found at TPJ.
When 0.99 isn't 0.99
by Thelonius (Curate) on Sep 11, 2003 at 19:16 UTC
    If you add these two lines:
    my $x = $final_data{$key}; printf "x = %.20f high= %.20f\n", $x, $high;
    you will get this output:
    x = 0.98999999999999999000 high= 0.99000000000000010000 x = 0.97999999999999998000 high= 0.98000000000000009000 x = 0.96999999999999997000 high= 0.97000000000000008000 x = 0.95999999999999996000 high= 0.96000000000000008000 x = 0.94999999999999996000 high= 0.95000000000000007000 x = 0.93999999999999995000 high= 0.94000000000000006000 x = 0.93000000000000005000 high= 0.94000000000000006000 x = 0.92000000000000004000 high= 0.93000000000000005000 x = 0.91000000000000003000 high= 0.92000000000000004000 x = 0.90000000000000002000 high= 0.91000000000000003000
    As you can see, the 0.99 that you get by assigning 0.99 is not the same as the 0.99 you get when you start with 0.90 and add 0.01 nine times. That's the way it is with binary floating point numbers. They're not exactly what you expect. There are several things you could do:
    1. Don't worry about it. If these are measurements, e.g., then ones that fall right on the boundaries of the bins are inherently ambiguous.
    2. Use an arbitrary-precision math package. There are ones for Perl, but I haven't used them, so I can't comment more.
    3. Scale everything so that they are all integers.
Re: When < isn't less than
by hossman (Prior) on Sep 11, 2003 at 19:30 UTC

    My favorite node on this topic is Still puzzled by floats.

    Particularly because I had just done a bunch of research on this prior to seeing that node (in an attept to educate many of my co-workers who didn't get it no matter how many times it bit them), and had it all fresh in my mind when I wrote my reply.

Re: When < isn't less than
by inelukii (Sexton) on Sep 11, 2003 at 21:29 UTC
    Thanks for the helpful responses. I'd printed %.6f and not seen any difference, thanks for the clarification.

    Inelukii

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://290777]
Approved by gjb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2014-08-22 00:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (145 votes), past polls