Yet Another Fair Shuffle Node

I realize that I'm risking excommunication by posting YAFSN, but I felt that I was beginning to repeat myself, so rather than reply to this latest comment deep in the bowels of Roy Johnson's very interesting functional shuffle thread, I figured I would just give the issue its own node, since it has little to do with functional programming, and I sense it may be a point of confusion.

In the thread When the Best Solution Isn't, and more recently in Is this a fair shuffle?, it was shown that certain "sort-based" shuffle algorithms do not sample the space of permutations uniformly (i.e. are not "fair"). More recently still, I posted a "sort-based" algorithm, which I dubbed the "Schwartzian Shuffle" because it uses the Schwartzian Transform that Perl programmers know and love, though the algorithm is much older than Perl, and probably goes by many other names (the one I recall seeing is "tag sorting"):

@ryaar =
  map  { $_->[ 0 ] }
  sort { $a->[ 1 ] <=> $b->[ 1 ] }
  map  { [ $_, rand ] }
  @array
[download]

The algorithm basically tags every element of the original array with a float randomly chosen in the unit interval and sorts the tagged elements according to the numerical order of the tags. Tagging-sorting-untagging is what the Schwartzian Transform is all about; this is just an unusual special case in which the tags bear no relation the tagged elements. The fairness of this algorithm (setting aside finite-precision artifacts) is pretty obvious, I think.

Any formal mathematical proof I can come up with of the algorithm's fairness is pretty tedious, especially without the benefit of basic mathematical notation, but here is a sketch of how one such proof would go. The probability that the first element in the input list gets the tag with the lowest value, and therefore ends in the first position of the shuffled list, is 1/N. The probability that it gets the second-lowest-numbered tag equals the probability that it doesn't get the lowest-numbered tag (i.e. (N - 1)/N) times the probability that it gets the lowest numbered among the remaining N - 1 tags (i.e. 1/(N - 1)); this product is also 1/N. It's easy to see the induction.

Contrast this with one of the sort-based algorithms mentioned in When the Best Solution Isn't, for example qshuf:

@random = sort { .5 <=> rand(1) } @array;    # DO NOT USE THIS!
[download]

The role of sorting in this algorithm is entirely different from its role in tag sorting. Here the sort's comparator has been modified to effect the shuffling. It is not as easy to determine the fairness or lack thereof of this algorithm through a mathematical argument (at least it's not as easy for me), but a simple operational test shows that it is far from fair.

And yes, as the article that Roy cited shows, the tag-sorting algorithm is not perfectly fair due to artifacts having to do with the finite precision of the random numbers used as tags (which means that there is a small but non-zero probability that two tags will have the same value, and this combined with the stability of the sorting leads to deviations from fairness; I blather more on this here if anyone is in the mood for a li'l hair-splittin'). But this is not an intrinsic flaw of the algorithm, and at any rate, its deviation from perfect fairness is too minuscule to lose sleep over (unless you are a Haskell programmer, I suppose :-) ). The principal reason not to use tag sorting is that it is not as efficient, either in space or time, as Fisher-Yates (which by the way, has minuscule deviations from perfect fairness of its own, again due to finite-precision artifacts).

the lowliest monk

Comment on Yet Another Fair Shuffle Node Select or Download Code

Replies are listed 'Best First'.
Re: Yet Another Fair Shuffle Node by Mugatu (Monk) on Apr 04, 2005 at 19:22 UTC
Since your node has gone nearly unreplied so far, I thought I should at least throw something out there. I really like this algorithm. It has a nice, intuitive feel. The Fisher-Yates may be preferable in many cases, but this one is still rather nice in and of itself. Nice post! Update: upon re-reading, it seems I used "nice" a bit too much in this post. Please substitute appropriate synonyms as you read it. :-)	[reply]
Re: Yet Another Fair Shuffle Node by NateTut (Deacon) on Apr 04, 2005 at 14:36 UTC
Why use floats for the tags? Wouldn't random integers do just as well?	[reply]
Re^2: Yet Another Fair Shuffle Node by Roy Johnson (Monsignor) on Apr 04, 2005 at 15:03 UTC
Random floats should have a lower incidence of collision. Caution: Contents may have been coded under pressure.	[reply]
Re^2: Yet Another Fair Shuffle Node by tlm (Prior) on Apr 04, 2005 at 19:38 UTC
Random integers would work fine, as long as you draw them from a sufficiently large range. The few PRNGs I have looked at, even when they give you a float in the end, start by doing an integer calculation and then dividing the random int found by the max possible value. So you could just skip that last normalization step, and use the random ints as your tags. I suppose this could make the algorithm marginally faster. the lowliest monk	[reply]
Re: Yet Another Fair Shuffle Node by tall_man (Parson) on Apr 04, 2005 at 19:46 UTC
I found a reference to the tagged sort as a shuffle once proposed by Knuth, so I withdraw by previous objections to it. As you point out, it is not as efficient as Fisher-Yates. But it will work.	[reply]


Perl: the Markov chain saw
	PerlMonks