Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re: [OT] Stats problem

by roboticus (Chancellor)
on Feb 26, 2015 at 13:12 UTC ( #1117948=note: print w/replies, xml ) Need Help??

in reply to [OT] Stats problem


I'm not sure I know exactly what you're asking, and unfortunately, statistics is one of those things where intuition can often lead you astray.

Let's do a little hand waving: Assume our initial state is that each field contains the ones complement of its address, and our addresses and values are a uniform distribution.

Case 1: select a random address (byte-aligned) and write a random byte. Probability of any field containing its address is obviously[1] 0. Consider that with the initial setup, you need to alter four consecutive bytes starting at a field boundary for there to be any chance of having a field value matching its address.

Case 2: select four addresses at random, and write a random byte to each one. Probability of any field containing its address: not zero, but much[5] less than 1/2^32. You'd have to (1) select four adjacent addresses, (2) all four addresses must be in the same field, and (3) write the correct values. I'm not going to compute this because I'm not sure that's the question you're asking[3].

Case 3: select n addresses at random, and write a random byte to each one, where n is larger than four. The probability will be larger than case 2, but still pretty darned small.

Case 4:[6] select a random address, and write n random bytes. For n less than four, the probability is obviously[4] 0. Where n is four through six, then it's much[5] more likely than for case 2, as n-1 addresses aren't random, so you won't miss overwriting a complete field as frequently (you'd miss 25%, 50% or 75% of the time, respectively). Where n is greater than 6, you'd never miss overwriting a field, so the probability increases more.

So if we knew a bit more about the behaviour of the overwriting process, it might be easier to figure out what the probabilities would look like. My hand-wavy opinion is that for cases where you're writing sequential strings of bytes (significantly longer than four bytes in a row) that the probability approaches 1/2^32; while for selecting bytes at random and overwriting them would be more like the Bloom filter math, and much less likely until the number of bytes being written becomes large enough (in which case, I'd expect your program to have failed spectacularly well in advance of hitting the tragic case.)

Oh, well, off to shower & work. I may think about it a bit more on my drive to work.


1) Every time[2] I work with statistics and say "X is obviously Y", I'm wrong.

2) Almost. If it were that reliable, I could just insert the word "not" after obviously, and be brilliant.

3) Not to mention that it'd take me a while to convince myself whether what I computed would be correct or not.

4) There's that word again. Feel free to insert a "not" here if I'm wrong.

5) The value of much left as an exercise to the reader.

6) Yeah, this should really be split out into more cases, but it's a pain and I've gotta get to work.


When your only tool is a hammer, all problems look like your thumb.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1117948]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2020-06-06 21:38 GMT
Find Nodes?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?

    Results (41 votes). Check out past polls.