in reply to
Re: Functional shuffle
in thread Functional shuffle
Sortbased shuffles like this are not good shuffles. You do not get all N! permutations with equal probability. The FisherYates shuffle can do it; this one does not. The discussion under When the Best Solution Isn't and under other recent shuffle threads (like Is this a fair shuffle?) explains why.
Re^3: Functional shuffle by blokhead (Monsignor) on Apr 02, 2005 at 16:40 UTC 
Did you compare the code above to the faulty code in your link? The code above is indeed a fair shuffle. Sortbased shuffles that use rand in the comparison block are the ones that don't work. See Re: Is this a fair shuffle?.
 [reply] 
Re^3: Functional shuffle by tlm (Prior) on Apr 02, 2005 at 16:45 UTC 
I agree that the sortbased shuffles discussed in When the Best Solution Isn't are fundamentally wrong, but this is one is qualitatively different; it doesn't work by tinkering with the sort function, but rather by assigning randomly numbered indexes to the elements of the array, and then sorting the array according to these indexes. The number of possible rank orderings of N randomly chosen real numbers r_{1},...,r_{N} in the unit interval is N!.
The objection to this shuffle, I suspect, is not on the grounds of correctness but rather on the grounds of efficiency.
Also, whether this meets Roy Johnson's original requirement that the shuffle be in a "functional" vein (whatever that means) is highly dubious.
 [reply] 

I'm basing my objection to the sortbased shuffle on this note in the paper that Roy Johnson referenced:
Furthermore, if we have a sequence of N elements and associate with
each element a key  a random number uniformly distributed within 0,
M1 (where N!>M>=N), we have the configurational space of size M^N
(i.e., M^N ways to key the sequence). There are N! possible
permutations of the sequence. Alas, for N>2 and M<N!, M^N is not
evenly divisible by N!. Therefore, certain permutations are bound to
be a bit more likely than the others.
I would further argue that even when M (the period of the psuedorandom number generator) is larger than N, M^N is still not divisible by N! in general. The FisherYates shuffle applies decreasing probabilities as it moves down the deck; the sorting shuffle uses uniform ones. That is the key difference.
I believe the discussion under A bad shuffle is also applicable.
 [reply] 

Strictly speaking this is correct, but similar arguments apply in one way or another to any shuffling algorithm that relies on a PRNG, since the number of possible shuffled configurations can easily exceed the period of the PRNG, and even when it doesn't, the size of the set of possible numbers (2^{k}, where k is the number of bits available to represent these numbers) generated by the PRNG won't be evenly divisible by N!. The argument I made in my previous post was based on the simplifying assumption that the indexes used are real numbers in the unit interval (as opposed to, what is in fact the case, members of a large but finite set of rationals). In this case there is zero probability that in a set of N > 1 randomly chosen numbers in [0, 1] two will be equal, and therefore the problems introduced by the fact that the sorting is stable become irrelevant. True, in fact the probability that any two such numbers generated by a PRNG are equal is not quite zero, but it is very small.
 [reply] 


The argument in the cited article apply to any shufling algorithm that uses anything other than random binary 2^{k}ary choices (in a program running on a binaryrepresentation computer). This is true even if the numbers are generated by an ideal uniform random process (as opposed to a PRNG), and then stored in computer memory (necessarily truncated to a finite precision). In other words, the problem is deeper than the one caused by the finite periods of PRNGs. Consider applying FisherYates to sorting an array of 3 elements. The first step requires selecting one of three options at random, and is done by testing whether a "random" number (or rather, a rational number obtained from truncating the precision of a randomly generated real number) stored in some computer register is within certain bounds or not. The cardinality of the set of such numbers is 2^{w}, where w is the number of bits in the register, and therefore it is impossible that each of the three elements in the list will be chosen with probability of exactly 1/3. With the exception of the very last one of those in which the element to swap is chosen from among 2^{k} alternatives, for some integer 0 < k < w, all the swaps in FisherYates (under the conditions stated) result from a nonuniform sampling.
The question remains of whether the magnitude of this deviation from perfect uniformity is one that can significantly impair the intended use of the algorithm, and the answer of course depends on the application. In the case of the example above, the magnitude of the relative error grows as N/2^{w}, so I imagine that a simulation that relies on a uniform sampling of the space of all rearrangements of a large number of elements may have to start worrying about this effect.
I reiterate that there is a fundamental difference between the flawed algorithms mentioned in When the Best Solution Isn't, and those that are based on sorting a set of pseudorandom number tags (like the one I posted). With the former, the deviation from the correct distribution can be substantial, and would occur even if one could use infinite precision, perfectly uniformly sampled random numbers, whereas with the latter this deviation is no bigger than it is for any shuffle algorithm that uses anything other than random binary 2^{k}ary choices (and of course, would disappear if the random numbers were of infinite precision). Therefore, the flaws in the former algorithms are logical flaws independent of numerical errors.
Update: There's a small inaccuracy in the argument I presented above, but correcting it does not affect its gist. Above I say that uniform sampling can occur only when the number of choices is 2; this is incorrect; it can occur only when the number of choices is 2^{k}, for some integer 0 < k < w, where w is as defined above.
 [reply] 



