Re: Equality checking for strings AND numbers
by BrowserUk (Patriarch) on Jul 13, 2007 at 00:21 UTC
|
If your data can contain reals, you might want to think about whether 10.0 == 10.000000000000001, or not.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
Good observation. Most values are integers, but with different precisions. Real numbers SHOULD have the same precisions in these files, and I actually want to detect if they don't i.e. 10 and 10.000000000000001 should be treated as different.
If exact comparison on reals becomes and issue, I guess I could use sprintf to compare only the leading decimal places, or do a ratio comparison.
Thanks for the heads-up.
| [reply] |
|
sub equality{
my ($a, $b, $eps) = @_;
abs( $a-$b ) < $eps ? return 1: return 0;
}
where $eps is the desired precision
Cheers,
lin0 | [reply] [d/l] |
|
|
|
Re: Equality checking for strings AND numbers
by syphilis (Archbishop) on Jul 13, 2007 at 02:05 UTC
|
Be aware that looks_like_number() returns true for strings like '9e0' and '9', but false for strings like '0x9'.
I don't know if it has any impact on what you are doing but your comp() subroutine will return true when comparing the numbers 9 and 0x9, will return true when comparing the strings '9e0' and '9', but will return false when comparing the strings '0x9' and '9' (or when comparing the string '0x9' to the number 9).
use strict;
use warnings;
use Scalar::Util qw(looks_like_number);
my $x1 = '0x9';
my $x2 = 0x9;
my $x3 = '9';
my $x4 = '9e0';
my $x5 = 9e0;
print "1: ", comp($x1, 9), "\n";
print "2: ", comp($x2, 9), "\n";
print "3: ", comp($x3, 9), "\n";
print "4: ", comp($x4, 9), "\n";
print "5: ", comp($x5, 9), "\n";
sub comp {
my ($a, $b) = @_;
if (looks_like_number($a) && looks_like_number($b)) {
return ($a == $b);
}
else {
return ($a eq $b);
}
}
__END__
Outputs:
1:
2: 1
3: 1
4: 1
5: 1
Cheers, Rob | [reply] [d/l] |
|
Thanks for the warning - all numerical values will be base10, sometimes in scientific format, so the looks_like_number call should work in this case.
So, looks_like_number only works for base10 (and below) numbers i.e. hexadecimal values with or without a trailing 0x will return false?
Although, of course not knowing the number base for numerical values will cause all kinds of other problems! ;)
| [reply] |
|
On another note you could use Algorithm::Diff which would allow you to provide your own matching (or "key generation") function as they call it. This gets over the deficiencies of Text::Diff in only comparing text strings.
| [reply] |
|
Looking at the Text::Diff module, I noticed the following:
my $diff = diff \&reader1,\&reader2;
I assume that this means you can use a subroutine to return the column you need from the input files and then just use Text::Diff to compare.
Do you have some sample input files? What sort of output are you expecting to be generated (a list of the differences, print to screen etc) and what should the format of this output be??
Updated: Questions added | [reply] [d/l] |
|
Re: Equality checking for strings AND numbers
by toma (Vicar) on Jul 13, 2007 at 08:11 UTC
|
This is a difficult problem in Perl for the most general case.
Numbers like 1111111111111111111e1111111111111111111 pass the 'looks_like_number' test but don't fare well in arithmetic expressions.
This doesn't do what you would hope:
use strict;
use warnings;
use Scalar::Util qw(looks_like_number);
my $c="11111111111111111e11111111111111111";
my $d="22222222222222222e22222222222222222";
if (looks_like_number($c) and
looks_like_number($d) and
$c == $d) {
print "$c = $d\n";
}
It should work perfectly the first time! - toma
| [reply] [d/l] |
|
That's probably because those large numbers are essentially Infinity? At least as far as normal numerical storage goes? The looks_like_number call does allow for Infinity, and treats it like a number, and Infinity == Infinity should be true!
I don't need to the use of any of the "big" number support, which I believe doesn't play well with the looks_like_number anyway.
| [reply] |
Re: Equality checking for strings AND numbers - the future
by tirwhan (Abbot) on Jul 13, 2007 at 15:27 UTC
|
Since noone has mentioned it so far I'd just like to point out that for Perl versions >= 5.9.3 you can use the smart match operator ~~ for this. So for example, the following would work:
use feature ":5.10";
my $x=10;
my $y="10.00";
say "matches" if ($x ~~ $y);
(tested with 5.9.5). See perlsyn for details on how smart match works.
| [reply] [d/l] [select] |
Re: Equality checking for strings AND numbers
by eXile (Priest) on Jul 13, 2007 at 16:34 UTC
|
I posted a similar problem before, and got a great answer ( Re: check if 2 values are equal ), involving putting all things to be compared in hash as hash keys and counting the number of keys. | [reply] |
Re: Equality checking for strings AND numbers
by mr_mischief (Monsignor) on Jul 16, 2007 at 01:47 UTC
|
Is there a maximum precision which any of the numbers will ever be?
If so, and you want any differences to be noted as you responded to BrowserUk, why not promote all things that look like numbers to some ridiculously high precision using sprintf() and then compare everything based on strings? (edit: fixed this sentence for grammar)
printf "%1.20f\n", int(10.1) ;
printf "%1.20f\n", 10 ;
printf "%1.20f\n", 012 ;
printf "%1.20f\n", "10" ;
printf "%1.20f\n", 1e1 ;
printf "%1.20f\n", 10.100 ;
printf "%1.20f\n", 10.1 ;
printf "%1.20f\n", '10.1' ;
printf "%1.20f\n", 10.1000000000 ;
printf "%1.20f\n", 10.1000000001 ;
You'll end up with roundoff errors on reals from the precision boost, but for perfectly equivalent values in the first place you should get the same roundoff errors. It's not like you're accumulating the errors through arithmetic with the values, since you're just promoting them and then immediately doing the comparison. The old adage about not testing floats for equality doesn't really apply here, unless you do want to allow a range of difference in the original inputs.
The main issue with this as I see it is that while you should be okay for a single environment, you'll potentially be dealing with different values for the floats if you try to take the promoted values as output from more than one software environment.
| [reply] [d/l] |
Re: Equality checking for strings AND numbers
by Anonymous Monk on Jul 15, 2007 at 06:37 UTC
|
When comparing the numerical data for equality are you ever comparing numbers of different precision? If not why not just convert all data to strings and compare the string results. In deciding whether or not to cover the data you could use something like:
$string = to_string($string) unless is_string($string);
(pulled from http://search.cpan.org/~dwheeler/Data-Types-0.06/lib/Data/Types.pm)
I am new to this so this is just a thought. | [reply] |
|
The problems here is that all of the numbers (ints and reals) do have different precisions - 10.0 and 10.00 are numerically equivalent, but are different when treated as strings.
| [reply] |
|
Check this page out, It has tests to find variables types, and how to convert them. You can test if a value is an int, if it is convert it to a real and then do the comparison.
http://search.cpan.org/~dwheeler/Data-Types-0.06/lib/Data/Types.pm.
With this you should be able to get it to at least the same variable type. If both become float types and and you compare 10.0 to 10.00, you should end up with equality... Another though that you can do is set up a tolerance for precision on number comparisons;
Instead of if a == b
do if (absolute value of (a - b)) > .0001 then .....
Just some thoughts. I personally haven't done a lot with Perl yet.
| [reply] |
|
I'd argue that if you're bothering to mention precision at all, then 10.0 != 10.00.
10.0 is really "somewhere between 9.95 and 10.05", and 10.00 is really "somewhere between 9.995 and 10.005". So if your 10.0 is really 9.97, it can't possibly be equal to 10.00.
| [reply] |
|
Re: Equality checking for strings AND numbers
by shoness (Friar) on Jul 16, 2007 at 13:29 UTC
|
Using the strtod and strtol methods from the POSIX module, you can convert the strings that Perl reads to numbers that you can operate on. It also suggests a nice "is_numeric" method:
# Begin quoting from <http://p3m.org/faq/C3/Q3.html>
sub getnum {
use POSIX qw(strtod);
my $str = shift;
$str =~ s/^\s+//;
$str =~ s/\s+$//;
$! = 0;
my($num, $unparsed) = strtod($str);
if (($str eq '') || ($unparsed != 0) || $!) {
return undef;
} else {
return $num;
}
}
sub is_numeric { defined getnum($_[0]) }
# end quoting...
sub comp {
use POSIX qw(strtol);
my ($a, $b) = @_;
if (is_numeric($a) && is_numeric($b)) {
return (strtol($a * 100) == strtol($b * 100));
} else {
return ( $a eq $b );
}
}
| [reply] [d/l] |
Re: Equality checking for strings AND numbers
by Moron (Curate) on Jul 16, 2007 at 13:25 UTC
|
As halley indicated, an absolute Epsilon test doesn't work well for all kinds of data. What about fractional comparison? e.g,: sub fromp {
my ( $x, $y, $eps) = @_;
( abs( ($y - $x ) / ( $x || $y || return (1) ) ) < $eps );
}
$eps should be the fractional closeness e.g. 0.000000001 would invoke a fractional threshold of a billionth.
The chain of ||s ensures that either the divisor is non-zero or division is prevented by returning 1 where both are 0 (therefore equal).
__________________________________________________________________________________
^M Free your mind! | [reply] [d/l] |
|
Sorry halley, I missed your post on the absolute Epsilon test. Just throwing around ideas to use and didn't realize that one was already out on the table
| [reply] |