comment on

There are two limiting cases for failure here. Assuming LIMIT is a power of two, either SIZE = LIMIT could be binned too low or SIZE = LIMIT-1 could be binned too high. For the latter to happen, 1/LIMIT would have to be smaller than your number of significant figures (you can show this with a Taylor series). Obviously where that falls depends on your machine and build, but on mine that starts failing at 2**48 and will always precede the transition to floating point representation for $size.

for (1 .. 2**10) {
    my $size = 2**$_;
    print "Fail $_ high\n" if (log((2**$_ - 1)) / log(2) ) == $_;
}
[download]

Regarding what I perceive as the main question, note that you are getting fortunate. If you run the code printf("$_: %.16e\n", log(2**$_)/log(2)) for (1 .. 2**10); (assuming native doubles) you will see that the result of your division is not exactly correct - your last digit is high in a large fraction of the offerings. This is a function of the logarithm as implemented. If instead of the above, I explore the powers of 3, all inexact cases are low, not high. This implies to me that the internal representation of log(2) is ever so slightly lower than the true value.

The better question is if you should care about this inaccuracy. If you are just gathering file statistics, inaccuracy in the absolute position of the boundary should not significantly skew your results assuming a smooth file size p.d.f. By the time your algorithm fails, you are nearly to a point where you can no longer identify file sizes with integers. However, if it is mission critical to be literally correct, you could use a hash to build a look-up table.

In reply to Re: May I be bitten by floating point arithmetic in the following restricted case? by kennethk
in thread May I be bitten by floating point arithmetic in the following restricted case? by rubasov

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.