There are two limiting cases for failure here. Assuming LIMIT is a power of two, either SIZE = LIMIT could be binned too low or SIZE = LIMIT-1 could be binned too high. For the latter to happen, 1/LIMIT would have to be smaller than your number of significant figures (you can show this with a Taylor series). Obviously where that falls depends on your machine and build, but on mine that starts failing at 2**48 and will always precede the transition to floating point representation for $size.
for (1 .. 2**10) {
my $size = 2**$_;
print "Fail $_ high\n" if (log((2**$_ - 1)) / log(2) ) == $_;
}
Regarding what I perceive as the main question, note that you are getting fortunate. If you run the code printf("$_: %.16e\n", log(2**$_)/log(2)) for (1 .. 2**10); (assuming native doubles) you will see that the result of your division is not exactly correct - your last digit is high in a large fraction of the offerings. This is a function of the logarithm as implemented. If instead of the above, I explore the powers of 3, all inexact cases are low, not high. This implies to me that the internal representation of log(2) is ever so slightly lower than the true value.
The better question is if you should care about this inaccuracy. If you are just gathering file statistics, inaccuracy in the absolute position of the boundary should not significantly skew your results assuming a smooth file size p.d.f. By the time your algorithm fails, you are nearly to a point where you can no longer identify file sizes with integers. However, if it is mission critical to be literally correct, you could use a hash to build a look-up table.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|