|Problems? Is your data what you think it is?|
Re: Random Math Questionby blokhead (Monsignor)
|on Oct 10, 2005 at 22:22 UTC||Need Help??|
I figured that it would be nearly impossible to tell the difference between a "truly" randomization of the list and one that resulted from many of my re-orderings.Looks like you and Dominus have bumped into the very large open problem in theoretical computer science of pseudorandom generation. Essentially, a pseudorandom generator (PRG) is an algorithm that takes a small "seed" of real randomness (say, n bits) and outputs a longer string (say, 2n bits) that "looks sufficiently random." Usually the definition of "looking sufficiently random" means that no polynomial-time algorithm can distinguish the output of the PRG from truly random bits (with a certain probability). Update: In layman's terms, the question is essentially: "Is it possible to tell (in a reasonable amount of time) how much randomness an algorithm uses just by looking at its output?"
Note that this is similar, but different from the notion of pseudorandom number generators (PRNGs) that you find in Perl and elsewhere. For these, "pseudorandom" means "they seem hard to predict so we hope it's pseudorandom in the above sense." ;)
PRGs are absolutely essential for provably secure cryptography. Before anyone asks, modern cryptographic algorithms (like RSA) are definitely not provably secure. Their security is based on widely-accepted (but unproven) assumptions that certain problems (in this case, discrete logarithm & factoring) are sufficiently difficult to compute.
That PRGs actually exist is even a stronger assumption than P ≠ NP, so this is a very difficult question. Most researchers believe that they do exist on some level. There is a great wealth of research dealing with (assuming PRGs do exist, of course) how much they can expand their random seeds, how simple the PRG algorithms can be, etc.
Anyway, be encouraged: you are in good company thinking that there may be algorithms that don't use much randomness but whose outputs are impossible to distinguish from those that use lots of randomness. On the other hand, since this is an extremely hard problem, don't hold it against Dominus that he wasn't able to come up with a distinguishing algorithm off the top of his head. For that reason, be discouraged as well! Both sides of the problem are quite difficult ;)
Using code, how can you determine the amount of randomness of a given list?An individual "shuffled" list is neither random or non-random. Randomness is a property of the process, not the individual outputs. When talking about an algorithm which tries to distinguish between a PRG and a truly random source, we (usually) allow for it to take multiple samples from the source before saying which kind of source it thinks it's getting. Even then, we allow for some probabilistic error in its decision.
Regarding the question of randomizing a list of 100_000 elements, you need to sample uniformly from the space of permutations of 100_000 elements. This requires log2(factorial(100000)) = 1516705 bits because that is the entropy of the random variable you wish to sample.