Problems? Is your data what you think it is? PerlMonks

### When < isn't less than

by inelukii (Sexton)
 on Sep 11, 2003 at 18:45 UTC Need Help??
inelukii has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I have inherited some code that does what would appear to be a simple binning operation. However, I have discovered an error in the binning but for the life of me don't see what's wrong with the code.

Given a set of data between 0.9 and 1, the code should place it in 12 different bins.
Bin 0 is < 0.9,
Bin 1 >= 0.9 && < 0.91,
Bin 2 >= 0.91 && < 0.92,
etc..
Bin 11 >= 1.0

When run, some values get shifted down, others are placed in appropriate bins. So far, I am concerned with the < operator because it shows for example, that 0.99 is < 0.99. I hope there is a stupid mistake that I am just missing; I'd appreciate any assistance. Whether the binning algorithm is efficient or not, I'm not currently concerned with, first and foremost is that it work.

Here's the code...

```#!/usr/bin/perl
use strict;

my %final_data = (
'1'    => 1.000,
'2'    => 0.990,
'3'    => 0.980,
'4'    => 0.970,
'5'    => 0.960,
'6'    => 0.950,
'7'    => 0.940,
'8'    => 0.930,
'9'    => 0.920,
'10'    => 0.910,
'11'    => 0.900,
'12'    => 0.890,
);

my \$min = 0.90;
my \$max = 1.00;
my \$low;
my \$high;
my \$incr = 0.01;

DATA_ITEM:
for my \$key ( sort { \$a <=> \$b } keys %final_data )
{
\$low = \$min;
\$high = \$min + \$incr;
for my \$bin ( 1 .. 10 )
{
if( \$final_data{\$key} < \$min )
{
warn "\$final_data{\$key} fell in bin 0 ( \$final_data{\$key} < \$min
+)\n";
\$low = \$high;
\$high += \$incr;
next DATA_ITEM;
}
elsif( \$final_data{\$key} >= \$max )
{
warn "\$final_data{\$key} fell in bin 11 ( \$final_data{\$key} >= \$ma
+x )\n";
\$low = \$high;
\$high += \$incr;
next DATA_ITEM;
}
elsif( (\$final_data{\$key} >= \$low) && (\$final_data{\$key} < \$high
+) )
{
warn "\$final_data{\$key} fell in bin \$bin ( \$final_data{\$key} >= \$
+low && \$final_data{\$key} < \$high )\n";
\$low = \$high;
\$high += \$incr;
next DATA_ITEM;
}
\$low = \$high;
\$high += \$incr;
}
}

Inelukii

Replies are listed 'Best First'.
Re: When < isn't less than
by dws (Chancellor) on Sep 11, 2003 at 19:07 UTC
When run, some values get shifted down, others are placed in appropriate bins. So far, I am concerned with the < operator because it shows for example, that 0.99 is < 0.99.

What it is actually showing is that

(0.9 + 0.01 + 0.01 + ... + 0.01) < 0.99
Welcome to the wild, wacky world of imprecise floating point representation. The problem, or one of them, is that the numbers you're using (other than 1.0) don't have precise counterparts in the internal floating point representation that chips use to represent real numbers. When you start adding them, the imprecision gets more noticeable.

If you're concerned, you might be able to scale your data up by 100x, then scale back when its time to display the buckets.

When 0.99 isn't 0.99
by Thelonius (Priest) on Sep 11, 2003 at 19:16 UTC
If you add these two lines:
```     my \$x = \$final_data{\$key};
printf "x = %.20f  high= %.20f\n", \$x, \$high;
you will get this output:
```x = 0.98999999999999999000  high= 0.99000000000000010000
x = 0.97999999999999998000  high= 0.98000000000000009000
x = 0.96999999999999997000  high= 0.97000000000000008000
x = 0.95999999999999996000  high= 0.96000000000000008000
x = 0.94999999999999996000  high= 0.95000000000000007000
x = 0.93999999999999995000  high= 0.94000000000000006000
x = 0.93000000000000005000  high= 0.94000000000000006000
x = 0.92000000000000004000  high= 0.93000000000000005000
x = 0.91000000000000003000  high= 0.92000000000000004000
x = 0.90000000000000002000  high= 0.91000000000000003000
As you can see, the 0.99 that you get by assigning 0.99 is not the same as the 0.99 you get when you start with 0.90 and add 0.01 nine times. That's the way it is with binary floating point numbers. They're not exactly what you expect. There are several things you could do:
1. Don't worry about it. If these are measurements, e.g., then ones that fall right on the boundaries of the bins are inherently ambiguous.
2. Use an arbitrary-precision math package. There are ones for Perl, but I haven't used them, so I can't comment more.
3. Scale everything so that they are all integers.
Re: When < isn't less than
by hossman (Prior) on Sep 11, 2003 at 19:30 UTC

My favorite node on this topic is Still puzzled by floats.

Particularly because I had just done a bunch of research on this prior to seeing that node (in an attept to educate many of my co-workers who didn't get it no matter how many times it bit them), and had it all fresh in my mind when I wrote my reply.

Re: When < isn't less than
by sutch (Curate) on Sep 11, 2003 at 19:12 UTC
Your issue has to do with the way that numbers are represented by Perl. An article that is helpful in understanding this can be found at TPJ.
Re: When < isn't less than
by inelukii (Sexton) on Sep 11, 2003 at 21:29 UTC
Thanks for the helpful responses. I'd printed %.6f and not seen any difference, thanks for the clarification.

Inelukii

Create A New User
Node Status?
node history
Node Type: perlquestion [id://290777]
Approved by gjb
help
Chatterbox?
 [choroba]: 3 PM in Vietnam, BTW Discipulus maybe a wrong impression.. [karlgoethebier]: Discipulus: Workshops are for weenies ;-) [Discipulus]: here 43C° high umidity, tiger mosquitos, violence in the street.. everywhere is vietnam [karlgoethebier]: OK, i'm socially unacceptable [karlgoethebier]: Discipulus: Were the hell are you today? [Discipulus]: ..or suitable for the presidence ;=) [Discipulus]: at Caput Mundi [marto]: choroba, have you moved to Vietnam or just there for a while?

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (11)
As of 2017-06-27 08:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
How many monitors do you use while coding?

Results (600 votes). Check out past polls.