Re: Coming Down From The Pedestal

Don't put that pedestal into cold storage just yet.

Whilst the entropy reasoning is undoubtedly valid at the theoretical level, and your posted algorithm had problems, in the practical sense, I'm not at all sure that you were wrong.

From my further reading on entropy, my take on it is that to detect the difference between "truely random" and a very good PRNG, the analyst needs to have a large number of samples to feed into their polynomial algorithms. Further more, those samples need to be of the raw output, or a known transposition of it so that they can re-consitute the raw output.

In the case under discussion, the shuffling of a 100,000 elements, they would need either to know the raw output of the RNG and PRNG used to shuffle the elements, or they would need to know the original ordering so that they could reconstitute them.

As the usual example of when the difference between a RNG and a PRNG becomes important is in cryptology, the GoodGuy is hardly likely to provide this information freely. The task becomes one of trying to determine the difference, by inspection, from the outside, by inspecting the encoded messages. From this viewpoint, neither the raw (P)RNG sequence, nor the transposition sequence (the plain text*/pre-shuffled data) is available.

We normally talk about the random sequence transposing the plain text, but if it is the random sequence that is being sought, the other way around is equally valid

With RNGs derived from natural sources, the raw sequence is already massaged (transition mapping--0/1 bias removal) to eliminate 'skew' from the sequences. For something to be truely random, there has to be the possibility of long sequences of all 1s. From what I read here (3rd section, 3rd paragraph), of every 2-bits of naturally random data read from the radio source, 50% of them (and 75% of the random bits) are rejected as they do not contain a transition. Ie. from the pairs of bits: 00, 01, 10 & 11, only first bit of the middle two make it through to the consumer.

So the RNG source is all ready being massaged according to some theoretical assumptions about what "truely random" is. Sure, it is still possible for that 1 in gigazillions chance of 3.2M 1 bits to be generated, but it is surely even less likely?

This may happen very rarely, but who's to say that atmospheric conditions, or several thousand years of radio-active decay will not culminate in the exact circumstances where a series of 3,200,000 1-bits is produced? Say, tomorrow.

If that exact sequence of 100,000 32-bit 0xffffffff happens to be the sequence generated and detected by the guy with the polynomial algorithm for measuring entropy, then I doubt that it will conclude that the generator is a truely random source.

Ah! You say, but that is only one sample. The polynomial algorithm needs many samples.

Yes, but what if that sequence is used to encrypt 4000 100-byte messages and the BadGuy gets his hands on most of those. He might well conclude that he has enough samples to analyse. Again, I doubt that he will conclude that the 100,000 0xffffffffs are "truely random".

And finally, the advice given when using a PRNG for cryptographical purposes (surely the only place where the difference between a RNG and a very good PRNG will ever matter?), is that a bunch of (say 8) values are taken and then combined together with a hashing algorithm to produce another (128-bit) number that gets used for the encryption.

Although several hashing algorithms have been "compromised", the compromises I am aware of are all of the nature of: "We can produce a set of (numerically calculated, but otherwise meaningless) data that will (re)produce a given hash value". They don't reconstitute the original data used to generate the hash value, and there are an infinite number of messages that could produce any given hash value.

So, if you are using a good PRNG, and you follow the advice and hash several values together to produce your encryption stream, and that stream is then transposed by combining it with the plain text. The possibility that the BadGuy can work the encrypted text, back to sufficient, sequential, raw values from the PRNG, to be able to feed it to his polynomial algorithm in order to determine that it indeed came from a PRNG with insufficient entropy to truely randomise the dataset--is minimal!

(My) Conclusion: A lower-than-theoretically-required entropied PRNG could be used to shuffle 100,000 element dataset sufficiently randomly for it to be beyond the scope of practicality for the determination to be made that it was not "truely random"--outside of the open (fully disclosed) environment of the laboratory.

Upshot: For most every practical purpose a good PRNG, used with care, is "good enough", and your inate, commonsense, gut-reaction to the theoretical statement that started that discussion was spot on.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?

"Science is about questioning the status quo. Questioning authority".

The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

Comment on Re: Coming Down From The Pedestal


Clear questions and runnable code get the best and fastest answer
	PerlMonks