Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Thank you roboticus.

I ran your algorithm against the results of my 60-hour run, and the correlation is simply stunning!

#! perl -slw use strict; sub ex_10 { my ($N, $X) = @_; return -$N/10*exp(-10*$X/$N) + 10*$N/9 *exp( -9*$X/$N) - 45*$N/8 *exp( -8*$X/$N) + 120*$N/7 *exp( -7*$X/$N) - 35*$N *exp( -6*$X/$N) + 252*$N/5 *exp( -5*$X/$N) - 105*$N/2 *exp( -4*$X/$N) + 40*$N *exp( -3*$X/$N) - 45*$N/2 *exp( -2*$X/$N) + 10*$N *exp( -1*$X/$N) + $X; } my $exp10_0 = ex_10( 2**32, 0 ); open I, '<', 'L25B32V10I32S1.posns' or die $!; while( <I> ) { my( $inserts, $collisions ) = split; my $exp3 = ex_10( 2**32, $inserts ) - $exp10_0; printf "Insert %10d: Predicted %6d Actual: %6d Delta: %+.3f%%\n" +, $inserts, $exp3, $collisions, ( $collisions - $exp3 ) / $exp3 +* 100; } __END__ C:\test>rbtcs-form-verify Insert 779967210: Predicted 1 Actual: 1 Delta: -18.051% Insert 782382025: Predicted 1 Actual: 2 Delta: +58.813% Insert 830840395: Predicted 2 Actual: 3 Delta: +29.292% Insert 882115420: Predicted 4 Actual: 4 Delta: -5.961% Insert 883031614: Predicted 4 Actual: 5 Delta: +16.325% Insert 897571477: Predicted 5 Actual: 6 Delta: +18.390% Insert 923155269: Predicted 6 Actual: 7 Delta: +4.086% Insert 948108745: Predicted 8 Actual: 8 Delta: -8.996% Insert 954455113: Predicted 9 Actual: 9 Delta: -4.244% Insert 967783959: Predicted 10 Actual: 10 Delta: -7.404% Insert 988381482: Predicted 13 Actual: 11 Delta: -17.487% Insert 992691311: Predicted 13 Actual: 12 Delta: -13.814% Insert 995935158: Predicted 14 Actual: 13 Delta: -9.624% Insert 1011301141: Predicted 16 Actual: 14 Delta: -16.457% Insert 1013742872: Predicted 17 Actual: 15 Delta: -12.616% Insert 1022258193: Predicted 18 Actual: 16 Delta: -14.242% Insert 1031874023: Predicted 20 Actual: 17 Delta: -16.989% Insert 1034026887: Predicted 20 Actual: 18 Delta: -13.909% Insert 1036254774: Predicted 21 Actual: 19 Delta: -11.051% Insert 1037064360: Predicted 21 Actual: 20 Delta: -7.093% Insert 1037193945: Predicted 21 Actual: 21 Delta: -2.569% Insert 1037309710: Predicted 21 Actual: 22 Delta: +1.957% ...( ~ 52000 ellided } Insert 2375752842: Predicted 52209 Actual: 52277 Delta: +0.130% Insert 2375755671: Predicted 52209 Actual: 52278 Delta: +0.131% Insert 2375756509: Predicted 52209 Actual: 52279 Delta: +0.133% Insert 2375760656: Predicted 52210 Actual: 52280 Delta: +0.133% Insert 2375763928: Predicted 52211 Actual: 52281 Delta: +0.134% Insert 2375785238: Predicted 52215 Actual: 52282 Delta: +0.128% Insert 2375788721: Predicted 52215 Actual: 52283 Delta: +0.128% Insert 2375789878: Predicted 52216 Actual: 52284 Delta: +0.130% Insert 2375790896: Predicted 52216 Actual: 52285 Delta: +0.131% Insert 2375798283: Predicted 52217 Actual: 52286 Delta: +0.131%

And that still includes the possibility -- though I believe it to be remote -- that there are 1 or more actual dups in there.

My only regret now is that I wish I'd allowed the run to go to the 3/4 point instead of stopping it half way. The number of false positive would still have been easily manageable for the second pass:

C:\test>rbtcs-bloom-probb 32 10 30e8 N=4294967296, V=10, X=30e8 integral(30e8)=12580199467.653, integral(0) +=12579822861.8159 Expected collisions: 376605.837186813

That is an order of magnitude less that my own crude attempt was predicting, hence why I stopped the run.

The only way I can repay you is to assure you that I will do my very best to try and understand the formula -- which I do not currently.

And of course, give you credit, when they world comes knocking at my door for my eminently patentable -- at least if you take Apple as your guide -- algorithm :)


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?


In reply to Re^3: [OT] The statistics of hashing.(SOLVED) by BrowserUk
in thread [OT] The statistics of hashing. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others scrutinizing the Monastery: (3)
    As of 2014-09-22 02:20 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (177 votes), past polls