http://www.perlmonks.org?node_id=31479


in reply to Re: Randomize an array
in thread Randomize an array

Having 0 in that list is a mistake since the qsort algorithm takes that to mean that they are equal and hence it only needs to use one of those to compare with others.

A test that is sensitive to this is looking for "rising sequences". Scramble a sorted list. Make successive passes through the array, looking for the first, then second, then third, then fourth, etc elements. How many passes did it take? With your algorithm it will take fewer passes than with a well-shuffled array.

ObTrivia: Take a fresh deck of cards. Shuffle a few times. Do the above test by suit. 2 suits start off sorted front to back, 2 back to front. This test is able to detect that for a long time. (Certainly well after the "7 shuffles leaves a well-shuffled deck" factoid a lot of people hear. By some measures it is well shuffled. By this one it still isn't. :-)

Replies are listed 'Best First'.
RE: RE (tilly) 2: Randomize an array
by BooK (Curate) on Sep 08, 2000 at 00:46 UTC

    I completely agree with you. 0 should not be here. It's just that we talked about

    sort { rand(3) - 1 } @array
    at the september 1999 Paris.pm meeting.

    I just remembered the sort rand trick and shared it. So here comes an updated version:

    sort { (-1,1)[rand 2] } @array
      I was trying to figure out how to explain why this was going to give a perfect sort, and I realized that in fact this trick simply cannot give a perfect sort for an array of size larger than 2 when fed an algorithm with bounded performance!

      This is cute.

      As the above makes clear, you are basically making random coin flips until it spits out an answer. If the array has n entries, there is some maximum number of flips, say k, that you could be asked for.

      Turning it around on its head I could just ask you up front to make k coin flips, then feed that into the algorithm, and get an answer. Now there are 2^k possible sequences that you could give me, and they are all equally likely. So if you write in base 2, the probability of any particular permutation is 2^(-k) times the number of sequences that gave that permutation.

      What that is I don't care. What I do care about is that when written out in base 2 that probability is a decimal that terminates after at most k places. But for each permutation the probability we really want is 1/n!, and if n is bigger than 2 then n! is divisible by 3 which is *not* a power of two so 1/n! has to be (base 2) an infinite repeating decimal!

      In other words I may not, without calculating it, have any idea how your anwer will be biased, but in fact it is!

      ObTrivia: A somewhat similar counting argument manages to show that the 400 year Gregorian calendar repeats days of the week, but cannot divide a particular day of the month between them. However it takes explicit calculation to show that the 13'th falls on Friday more than 1 day in 7.

Re: RE (tilly) 2: Randomize an array
by blazar (Canon) on Aug 01, 2008 at 20:51 UTC
    Having 0 in that list is a mistake since the qsort algorithm takes that to mean that they are equal and hence it only needs to use one of those to compare with others.

    I personally believe that before someone else points out that the node I'm replying to is some eight years old, I'd better do so myself: the reason why I'm doing it that I've seen this very thread referenced quite recently; which means that some people may still be looking here for info they will try to actually use. And here a terribly wrong technique has been advertised, so I'm writing this post for the benefit of potential readers to the effect of warning them not to use it! Namely, even if amended of the "0 in that list" problem, the general idea of shuffling a list by sorting it with a routine returning random values e.g.:

    my @shuffled = sort {.5<rand} @input;

    is flawed! Several reasons why it is are explained in a thread which came two years later. To put it briefly, if you really want to use sort for this, at the very least you must generate the list of random values in advance, whatever actual technique you actually choose in the end. E.g.:

    my %r; my @shuffled = sort { ($r{$a}||=rand) <=> ($r{$b}||=rand) } @input;

    or

    my @r = map rand, @input; my @shuffled = @input[sort {$r[$a] <=> $r[$b]} 0..$#input];

    Update: as a novice, back in my old days at the Monastery whenever I happened to see a post of mine which -exactly like this one and its companion- turned out to have a negative reputation, I would edit it to the effect of inserting an update and whine about the downvotes or more precisely asking why it earned them. Of course, I've learnt how to do better in the meantime, and generally refrain to. Except this time: since I would like to "thank" all those geniuses who downvoted this node on part of the poor newbie who will stumble upon it one day, and judging from the reputation will think that my complaints are moot, thus possibly following the advice of the broken shuffling technique... only to be bitten in the neck, but hopefully not later than having spread the word, because "it is so cool..."

    --
    If you can't understand the incipit, then please check the IPB Campaign.
      First of all, I did not recommend that sorting method.

      Secondly in this very thread I already explained why this randomization technique could not possibly provide perfectly random answers.

      So while I commend your desire to add useful information to old threads, I don't think you picked the right target in this case.

        First of all, I did not recommend that sorting method.

        I personally believe that I did never claim nor imply that you did; indeed you may notice that in the post you're replying to, I wrote: "a terribly wrong technique has been advertised" and the link under "advertised" points to a post of BooK. I replied to yours because one may have the impression that removing the "0 inconvenience" the technique may work after all: admittedly, I had overlooked that other post of yours, but then perhaps others would as well. I hope that you take no offense and that you will agree that the caveat earned much more visibility with the aid of these interventions...

        --
        If you can't understand the incipit, then please check the IPB Campaign.