Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Issue while reading decimal number from a Spreadsheet using perl

by KishKishore (Initiate)
on Oct 21, 2019 at 05:31 UTC ( #11107744=perlquestion: print w/replies, xml ) Need Help??

KishKishore has asked for the wisdom of the Perl Monks concerning the following question:

Hi Guys, I'm very new to perl. I was just doing some read/write operations in excel sheet using perl. My issue is: When i write a decimal number (fraction number. Ex : 2.5), and read it back, the output is some long number with extra decimals. I dont know why this is happening. Please help me

use Spreadsheet::ParseExcel; use Spreadsheet::ParseExcel::SaveParser; ## Write to an existing excel sheet $parser = Spreadsheet::ParseExcel::SaveParser->new(); my $workbook = $parser->Parse('myxls.xls'); # Get the first worksheet. my $worksheet = $workbook->worksheet("myxls"); my ( $row_min, $row_max ) = $worksheet->row_range(); my ( $col_min, $col_max ) = $worksheet->col_range(); $worksheet->AddCell( $row_max+1, 0, '10.1' ); $worksheet->AddCell( $row_max+1, 1, '10.2' ); $worksheet->AddCell( $row_max+1, 2, '10.3' ); $worksheet->AddCell( $row_max+1, 3, '10.4' ); $worksheet->AddCell( $row_max+1, 4, '10.5' ); $worksheet->AddCell( $row_max+1, 5, '10.6' ); $worksheet->AddCell( $row_max+1, 6, '10.7' ); $worksheet->AddCell( $row_max+1, 7, '10.8' ); $worksheet->AddCell( $row_max+1, 8, '10.9' ); $workbook->SaveAs('myxls.xls'); ## Excel Read $parser = Spreadsheet::ParseExcel::SaveParser->new(); my $workbook = $parser->Parse('myxls.xls'); # Get the first worksheet. my $worksheet = $workbook->worksheet("myxls"); my ( $row_min, $row_max ) = $worksheet->row_range(); my ( $col_min, $col_max ) = $worksheet->col_range(); for($r=$row_min; $r<=$row_max; $r++){ for($c=$col_min; $c<=$col_max; $c++){ $d = $worksheet->get_cell( $r, $c); if(defined $d) { $d= $d +->unformatted(); print "-$d-\t";} } print "\n"; }

Result : -10.0999999999999996- -10.1999999999999993- -10.3000000000000007- -10.4000000000000004- -10.5- -10.5999999999999996- -10.6999999999999993- -10.8000000000000007- -10.9000000000000004- I don't understand why this extra decimal numbers. When i open excel and check, this extra decimals are not there. Please help me

Replies are listed 'Best First'.
Re: Issue while reading decimal number from a Spreadsheet using perl
by haukex (Chancellor) on Oct 21, 2019 at 06:29 UTC

      Tom Scott also has a nice video on the topic on Youtube: https://www.youtube.com/watch?v=PZRI1IfStY0

      perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'

      Thanks.. I'll check that.

Re: Issue while reading decimal number from a Spreadsheet using perl
by rjt (Curate) on Oct 21, 2019 at 08:16 UTC

    haukex correctly identified the issue as floating point error. While I'm sure the site that was linked does a better job of explaining this problem than I will, but even though there are a bazillion such explanations on the web already, I'm going to explain it briefly again, and then I'm going to relate it to your specific example, and then I'm finally going to show you how to avoid this class of errors.

    Floating point errors (a very truncated explanation)

    Your problem is not a problem with Excel or Perl, but a problem with how your computer stores floating point (non-integer) numbers. Computers store numbers in binary (base 2), while you are asking for them to be printed in decimal (base 10). For integers up to a maximum size, there is no loss of information converting between binary and decimal. Floating point numbers offer no such guarantee. They are stored as three distinct parts: a sign bit, the exponent, and the mantissa/significand, which means your number has to be converted to look like number = sign * mantissa * 2exponent. The problem is there are only so many bits available for the mantissa: 23 for single precision, or 52 for double. Those bits (b0 .. b23) are added together like so: b0/2 + b1/4 + ... Do you see now why some decimal values can't be represented exactly?

    For example, -10.1 (your first example) looks like this in single precision IEEE-754 floating point format:

    S Exponent Mantissa - -------- ----------------------- 1 10000010 01000011001100110011010 - 130 => 3 (1/1) + 0/2 + 1/4 + 0/8 + 1/16 + 1/32 + ... ~ 1.26250004768 +37158

    The actual exponent is exponent - 127, so 130 - 127 gives you 23. This is to allow for negative exponents.

    The mantissa is read from the back to the front (so the rightmost 0 corresponds to the 1/2s place, the one to the left of that is the 1/4s place, and so on). There's an implicit (1/1) added on that is not stored in the floating point number itself.

    So putting all of that together, you get -1 * 23 * (0/2 + 1/4 + 0/8 + 1/16 + 1/32 + ...), and, you guessed it, that doesn't work out exactly to -10.1.

    How to fix floating point1

    In general, you can't; floating point binary numbers do not always convert exactly to decimal. You can increase the precision, but in practice, that just increases the count of unwanted 9s or 0s on the end of your number. You could use fixed point arithmetic if your numbers fall within a narrow enough range of exponents. You can use arbitrary precision like Math::BigFloat, but there is a massive performance penalty associated with that (not to mention the unbounded memory requirements), and you don't typically need unlimited precision anyway. So that brings me to the commonly used option, which is rounding to the required number of decimal places.

    But you still need to be careful. Let's say you need two decimal places. You might think you can do printf "%.2f", $value and rely on that result. Not if you care about that last decimal place! Here's a classic example:

    printf "%.2f =?= %.2f", 7.525, 6.02 * 1.25; # Output: 7.53 =?= 7.52

    In other words, numbers that should be identical may be rounded differently. To get around this, you need to request one more decimal place than you need, and then look at the last digit (in decimal form) to decide whether to round up or down. Thankfully, there's a module for that:

    use Math::Round; printf "%s == %s\n", nearest(0.01, 7.525), nearest(0.01, 6.02 * 1.25); # 7.53 == 7.53

    When to round is also important. You typically want to round right before you display or output the number, not before, or you will lose precision. And don't round and then shove the result back into a variable Perl will treat as a number, or you won't have fixed anything!

    Floating point numbers are full of other fun pitfalls, especially if you start mixing formats. Please do read the link that haukex gave you for a much more complete discussion.

    _______________
    1. Pun absolutely intendianed.

      Thanks a lot.. This helped me understand many things along with my issue

Re: Issue while reading decimal number from a Spreadsheet using perl
by syphilis (Bishop) on Oct 21, 2019 at 08:28 UTC
    In reading the numbers back, they are being rounded to 18 decimal digits of precision:
    use warnings; for(10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9) { printf "%.18g\n", $_; } __END__ Outputs: 10.0999999999999996 10.1999999999999993 10.3000000000000007 10.4000000000000004 10.5 10.5999999999999996 10.6999999999999993 10.8000000000000007 10.9000000000000004
    I don't know why that is happening or how to influence the result, but you clearly want those numbers to be rounded to 3 decimal digits of precision:
    use warnings; for(10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9) { printf "%.3g\n", $_; } __END__ Outputs: 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9
    Cheers,
    Rob
Re: Issue while reading decimal number from a Spreadsheet using perl
by poj (Abbot) on Oct 21, 2019 at 09:13 UTC

    Add this to your program to show what versions are you using. Also what version of Excel was used to create myxls.xls ?

    #!perl use Spreadsheet::ParseExcel::SaveParser; printf "OS %s Perl %s ParseExcel %s\n",$^O,$^V,$Spreadsheet::ParseExcel::VERSION;

    Running your scripts I get

    OS MSWin32
    Perl v5.16.1
    ParseExcel 0.65
    -10.1-  -10.2-  -10.3-  -10.4-  -10.5-  -10.6-  -10.7-  -10.8-  -10.9-
    
    poj

      It was surprising to me that your system doesn't show any apparent floating point error. This prompted me to (finally) try the OP's the code myself, and sure enough, I had perfect rounding, too! Yet the OP's output corresponds to what I would actually expect from a double precision IEEE-754 float:

      perl -le 'printf "%.18f", 10.1' 10.099999999999999645

      Suspecting skulduggery, I dug into the Spreadsheet::ParseExcel version 0.65 code, and I was not disappointed. See Utility.pm, around line 800. Indeed, the module does its own rounding because of RT#45626:

      # Fix for Perl and sprintf not rounding up like Excel. # http://rt.cpan.org/Public/Bug/Display.html?id=45626 if ( $data =~ /^${number_result}5/ ) { $number_result = sprintf "%0${str_length}.${after_decimal}f", $data . '1'; }

      This section was last changed ten years ago, suggesting OP may have a very old version, so your request for the version number was apropos.

      Aside: The sub that's in is called ExcelFmt and is almost 900 lines long. The entire package is 45000 lines. I'm pretty sure the section around line 800 is responsible for the rounding, but that's a lot of code to grok in five minutes. I'd be happy to be corrected on the specifics, but the point is, something is doing the rounding. :-)

      And yes, OP, I'd still suggest doing your own rounding for the reasons stated in my comment, to ensure you are rounding to your own specifications.

Re: Issue while reading decimal number from a Spreadsheet using perl
by vr (Deacon) on Oct 22, 2019 at 10:52 UTC

    It looks like your Perl was compiled to use "long double" for floating point numbers, hence 18 digits of precision instead of 15 (i.e. "%.18g" format) for default output. But Excel stores numbers as IEEE-754 doubles, so you simply loose precision on export to xls. What you are reading back from xls is worse approximation of e.g. "10.1" (which can't be represented precisely in binary floating point of whatever size) from your Perl POV, than what it converts source string "10.1" to.

    Not sure if my next advice will be popular, but if your task is relatively short/simple scripts, and if you have to compare floating point numbers restored from xls binary data with e.g. user input (or any ASCII text file input), then, for simplicity/sanity, it just could be easier to use Perl which uses plain old IEE-754 doubles.

    >perl -V:nvsize nvsize='8'; >perl -MConfig -E "say $Config{uselongdouble}" >perl -E "say unpack 'd', pack 'd', 10.1" 10.1 >perl -E "say unpack 'f', pack 'f', 10.1" 10.1000003814697

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11107744]
Approved by haukex
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (8)
As of 2019-11-16 21:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (85 votes). Check out past polls.

    Notices?