Re^2: Randomness

There is a big difference between "works very well" and hasn't broken yet.

I agree. However "hasn't broken yet" isn't very scientific. Any "random" algorithm is broken over a sufficiently large data set. That is the basis behind Chaos Theory. Random events or data are not very random if you take a large enough data set.

It all boils down to what you consider to be acceptably "broken" and what your exposure is.

Peter L. Berghold -- Unix Professional Peter at Berghold dot Net
	Dog trainer, dog agility exhibitor, brewer of fine Belgian style ales. Happiness is a warm, tired, contented dog curled up at your side and a good Belgian ale in your chalice.

Comment on Re^2: Randomness

Replies are listed 'Best First'.

Re: Re^2: Randomness
by sauoq (Abbot) on Sep 23, 2003 at 18:47 UTC

Any "random" algorithm is broken over a sufficiently large data set. That is the basis behind Chaos Theory. Random events or data are not very random if you take a large enough data set.

I'm not sure I even understand those statements...

Firstly, by saying, "any 'random' algorithm is broken over a sufficiently large data set" are you implying that said algorithm is not broken over a smaller data set? Perhaps you got the phrasing backward and you meant that any such algorithm is broken over a smaller data set. Afterall, in the case of unique IDs, a smaller data set will be more likely to result in a duplicate ID than a large one. In any case, though, it's the reliance on randomness that breaks it, not the size of the data set.

Secondly, saying "random events or data are not very random if you take a large enough data set" doesn't make any sense at all. Randomness has nothing to do with the size of the data set¹. It has everything to do with predictability. If you have a function which randomly returns either 0 or 1 then you can choose numbers between 0 and 2**9876543210 - 1 with no loss or gain of randomness.

Finally, I don't see how any of this has anything to do with Chaos Theory. CT is concerned with deterministic processes where minute (even immeasurably so) differences in initial conditions can result in very different final states. The theory explains how apparent randomness can be observed even in very well-understood determistic systems.

It all boils down to what you consider to be acceptably "broken" and what your exposure is.

My point was that using randomness for generating unique IDs should not be recommended. There are ways to do it that aren't broken. Why concern yourself with using statistical analysis to determine how likely it is your program will fail when you can avoid failure altogether?

1. Well, almost nothing. Nothing for data sets with at least two elements you can choose. In other words, if you can only make one choice, you can't make it "randomly."

-sauoq
"My two cents aren't worth a dime.";

[reply]


Think about Loose Coupling
	PerlMonks