Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: [OT] Stats problem

by RichardK (Parson)
on Feb 26, 2015 at 11:14 UTC ( #1117931=note: print w/replies, xml ) Need Help??


in reply to [OT] Stats problem

AFAICT the odds are exactly the same. You seem to be asking what are the odds of a given 4 bytes being set to a single specific value. It doesn't make any difference if that value is 0xDEADBEEF or its memory address -- the odds are still 1 in 2^32.

Replies are listed 'Best First'.
Re^2: [OT] Stats problem
by BrowserUk (Pope) on Feb 26, 2015 at 12:06 UTC

    Hm. Not convinced.

    If you write random bytes to the whole 4GB, then inspect them as 4-byte aligned U32s, then only 1/4 of the possible values or less will appear.

    Only inspect every other 4-byte aligned U32 and only 1/8th or less of the possible values will appear.

    And if you write the the same value to every single slot, it could only match the offset at one position.

    So the 7/8th or more of the possible values will not appear any where; and of the rest, the chances that the value will appear at an offset that matches the value have to be slim. Bordering on impossible.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      If you're not convinced then why not build yourself a small monte carlo simulation and try it out?

        Atfer 5000 runs of 1/2 billion checks per run; the result is:

        ... Run: 2000 buks:231 stds:273 ... Run: 3000 buks:346 stds:393 ... Run: 4000 buks:466 stds:520 ... Run: 5000 buks:577 stds:663 Run: 5001 buks:578 stds:664 Run: 5002 buks:579 stds:664 Run: 5003 buks:579 stds:664 Run: 5004 buks:579 stds:665

        So the odds are:

        5000 * 536870912 = 2684354560000 total checks / false hits = odds of a false hit / 579 = 4636190949 offset / 665 = 4036623398 0xdeadbeef Expected odds = 4294967296

        Which mean you are right!

        However, as the statistics above reflect, and as I observed from several (very) short runs whilst sanity checking the code; the offset seems to beat the odds every time; whilst the fixed magic number seems to come a little shy of it every time.

        There are not enough observations and not a sufficiently big difference between them to conclude that this is anything other than expected variation. But it does seem consistent.

        I've started another (low priority) run with some sanity check code enabled that counts the occurrences of each random value seen. The extra code means it runs much more slowly; and the restrictions of my physical memory mean I've had to limit the counts to unsigned bytes; but by outputting when those counts rollover it should give a clear indication of whether all values are being generated, as 96 % of them should rollover within a few dozen runs of each other -- if my calculations are correct I should see the bulk of them at around 1024 runs mark.

        All of which goes to reinforce my long standing observation that -- for me -- statistics is the second most unintuitive thing -- after quantum mechanics -- that I know just-enough-to-be-dangerous about.

        At least with QM I'm in good company when it comes to finding it spooky :)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
        build yourself a small monte carlo simulation

        Well, its running, but it'll need to run for a while to be statistically valid.

        In the meantime are you saying that the fact that the value at any given offset has to match both the value and the offset has no influence upon the chances of a false positive?

        The UK national lottery picks 6 balls from 49: 49!/(6!*(49-6)!) = 1:13,983,816 chance.

        And once all 6 balls are out of the machine; they reorder them by ascending value; so the result is always shown as B1 < B2 < B3 < B4 < B5 < B6.

        But if players had to match both the numbers and their draw order, it would be a lot harder. The odds would be 6! * 13,983,816 = 1:10,068,347,520.

        So, value and position: highly increased odds; but you're saying that's not a factor here?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1117931]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (7)
As of 2020-05-31 11:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If programming languages were movie genres, Perl would be:















    Results (173 votes). Check out past polls.

    Notices?