Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Randomize an array

by Zebu (Novice)
on Sep 08, 2000 at 00:00 UTC ( #31461=perlquestion: print w/ replies, xml ) Need Help??
Zebu has asked for the wisdom of the Perl Monks concerning the following question:

Well, what is the quickest way to randomize an array?

Comment on Randomize an array
(ar0n: Algorithm::Numerical::Shuffle) Re: Randomize an array
by ar0n (Priest) on Sep 08, 2000 at 00:07 UTC
    Algorithm::Numerical::Shuffle

    I'm not sure if it's the fastest, but it is the easiest:
    use Algorithm::Numerical::Shuffle qw(shuffle); @shuffled = shuffle (1, 2, 3, 4, 5, 6, 7);

    -- ar0n (just another perl joe)

Re: Randomize an array
by lhoward (Vicar) on Sep 08, 2000 at 00:07 UTC
    By "randomize" I assume you mean "shuffle" (rearrange all the elements in a random fashion). The fastest and probably best way is to use the Algorithm::Numerical::Shuffle.
    use Algorithm::Numerical::Shuffle qw /shuffle/; @shuffled = shuffle (1, 2, 3, 4, 5, 6, 7);
Re: Randomize an array
by BlaisePascal (Monk) on Sep 08, 2000 at 00:09 UTC
    How random do you want it, and what do you mean by "randomize"? I assume you want a random permutation, all permutations equally likey.

    The traditional solution is something like:

    for $i ($#array..0) { my $j = rand($i); @array[$i,$j] = @array[$j,$i]; }
    This does @array swaps, and can be shown (see Knuth) that all permutations are equally likely -- assuming I haven't made any coding mistakes

    This is probably the fastest randomization algorithm, although various implementations might be faster than mine.

        That's what I get for reading too much of the Perl6 RFCs... I could have sworn that ($high..$low) would count down, not up...
RE: Randomize an array
by BlueLines (Hermit) on Sep 08, 2000 at 00:10 UTC
    Here's what the Cookbook offers for randomizing arrays:
    <br><br># fisher_yates_shuffle( \@array ) : generate a random permutat +ion # of @array in place sub fisher_yates_shuffle { my $array = shift; my $i; for ($i = @$array; --$i; ) { my $j = int rand ($i+1); next if $i == $j; @$array[$i,$j] = @$array[$j,$i]; } } fisher_yates_shuffle( \@array ); # permutes @array in place
    BlueLines

    Disclaimer: This post may contain inaccurate information, be habit forming, cause atomic warfare between peaceful countries, speed up male pattern baldness, interfere with your cable reception, exile you from certain third world countries, ruin your marriage, and generally spoil your day. No batteries included, no strings attached, your mileage may vary.
      The 'next if..' line in this post, and the 'unless ...' clause in the next post are unneccessary, and slow the algorithm down for an array of ~20 elements or larger.

      This is because if you assume the cost of a comparison is 1, and the cost of a swap is ~10, then when $i > 10, the savings on a swap are lost because odds are < 1 in 10 that you're swapping the same element. Benchmark comparison against swapping, and do the math. YMMV.
RE: Randomize an array
by KM (Priest) on Sep 08, 2000 at 00:13 UTC
    Fisher-Yates shuffle.

    my @array = qw /one two three four five six/; for (my $i = @array; -- $i;) { my $r = int rand (1 + $i); @array [$i, $r] = @array [$r, $i] unless $r == $i; }

    Or use Algorithm::Numerical::Shuffle which implements this for you

    use Algorithm::Numerical::Shuffle qw/shuffle/; my @array = shuffle qw/one two three four five/;

    Cheers,
    KM

RE: Randomize an array
by Adam (Vicar) on Sep 08, 2000 at 00:14 UTC
    I recommend the Fisher-Yates Shuffle. See my node on the topic.
    # The Fisher-Yates Shuffle sub Shuffle { my $arrayref = shift; my $index = @$arrayref; while ( $index-- ) { my $newloc = int rand (1+$i); # next if $index == $newloc; # Not needed in Perl. @$arrayref[$index, $newloc] = @$arrayref[$newloc, $index]; } }
    The line I commented out is part of the Fisher-Yates Shuffle, but its a waste of time in Perl, since the swap is done correctly. (If it were an XOR swap then you would want the next in there)
      That code has an error and it doesn't actually randomize. Here's the correct version:
      # The Fisher-Yates Shuffle sub Shuffle { my $arrayref = shift; my $index = @$arrayref; while ( $index-- ) { my $newloc = int rand (1+$index); # next if $index == $newloc; # Not needed in Perl. @$arrayref[$index, $newloc] = @$arrayref[$newloc, $index]; } }
Re: Randomize an array
by BooK (Curate) on Sep 08, 2000 at 00:16 UTC
    Well, if you don't want to install yet another module, and the randomness you seek is no better than the one provided by rand, you can try the following:
    sort { (-1,0,1)[rand 3] } @array
      Having 0 in that list is a mistake since the qsort algorithm takes that to mean that they are equal and hence it only needs to use one of those to compare with others.

      A test that is sensitive to this is looking for "rising sequences". Scramble a sorted list. Make successive passes through the array, looking for the first, then second, then third, then fourth, etc elements. How many passes did it take? With your algorithm it will take fewer passes than with a well-shuffled array.

      ObTrivia: Take a fresh deck of cards. Shuffle a few times. Do the above test by suit. 2 suits start off sorted front to back, 2 back to front. This test is able to detect that for a long time. (Certainly well after the "7 shuffles leaves a well-shuffled deck" factoid a lot of people hear. By some measures it is well shuffled. By this one it still isn't. :-)

        I completely agree with you. 0 should not be here. It's just that we talked about

        sort { rand(3) - 1 } @array
        at the september 1999 Paris.pm meeting.

        I just remembered the sort rand trick and shared it. So here comes an updated version:

        sort { (-1,1)[rand 2] } @array
        Having 0 in that list is a mistake since the qsort algorithm takes that to mean that they are equal and hence it only needs to use one of those to compare with others.

        I personally believe that before someone else points out that the node I'm replying to is some eight years old, I'd better do so myself: the reason why I'm doing it that I've seen this very thread referenced quite recently; which means that some people may still be looking here for info they will try to actually use. And here a terribly wrong technique has been advertised, so I'm writing this post for the benefit of potential readers to the effect of warning them not to use it! Namely, even if amended of the "0 in that list" problem, the general idea of shuffling a list by sorting it with a routine returning random values e.g.:

        my @shuffled = sort {.5<rand} @input;

        is flawed! Several reasons why it is are explained in a thread which came two years later. To put it briefly, if you really want to use sort for this, at the very least you must generate the list of random values in advance, whatever actual technique you actually choose in the end. E.g.:

        my %r; my @shuffled = sort { ($r{$a}||=rand) <=> ($r{$b}||=rand) } @input;

        or

        my @r = map rand, @input; my @shuffled = @input[sort {$r[$a] <=> $r[$b]} 0..$#input];

        Update: as a novice, back in my old days at the Monastery whenever I happened to see a post of mine which -exactly like this one and its companion- turned out to have a negative reputation, I would edit it to the effect of inserting an update and whine about the downvotes or more precisely asking why it earned them. Of course, I've learnt how to do better in the meantime, and generally refrain to. Except this time: since I would like to "thank" all those geniuses who downvoted this node on part of the poor newbie who will stumble upon it one day, and judging from the reputation will think that my complaints are moot, thus possibly following the advice of the broken shuffling technique... only to be bitten in the neck, but hopefully not later than having spread the word, because "it is so cool..."

        --
        If you can't understand the incipit, then please check the IPB Campaign.
      That code does not work.

        Are you sure?

        #!/usr/bin/perl # use strict; use warnings; my @array=0..100; @array=sort { (-1,0,1)[rand 3] } @array; print join(',',@array),"\n";
        sini@ordinalfabetix:~$ ./x.pl 70,61,28,24,5,52,41,57,21,53,82,10,34,86,29,12,46,2,0,1,56,14,47,4,3,6 +,58,8,7,20,9,72,17,11,13,18,68,25,55,22,67,54,23,15,90,16,19,40,42,43 +,59,30,65,36,60,27,95,76,62,88,31,66,26,33,32,38,37,35,64,50,39,44,69 +,45,81,51,63,74,49,48,77,84,87,94,71,80,79,73,83,75,85,78,89,92,91,93 +,98,99,96,100,97 sini@ordinalfabetix:~$

        Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

Re: Randomize an array
by Zebu (Novice) on Sep 08, 2000 at 02:09 UTC
    Wow, thanks! 1 line instead of this ugly code:
    foreach $prise (@cartes) { $xmel = int (rand(9999)); $cartes[$i++] = $xmel . X . $prise; } @cartes= sort @cartes; $i=0; foreach $prise (@cartes) { $prise=~s/.*X//; $cartes[$i++]=$prise; }

    :-)

    BTW, it IS meant to shuffle a deck of cards...

Re: Randomize an array
by Adam (Vicar) on Sep 08, 2000 at 03:00 UTC
    I was thinking about that "one line" of code... and wondering if it was really a good shuffle. I'm not convinced that it is a better shuffle then Fisher-Yate's, since it doesn't seem to allow anything to stay in its original location. It also occured to me that calling QuickSort with a random requirement like that would take awhile. So I ran a benchmark.
    DB<4> sub Shuffle{my $ar=shift; my $i=@$ar; while($i--){my $n=int ra +nd(1+$i); @$ar[$i,$n]=@$ar[$n,$i]}} DB<5> @a = ( 0..9999 ) DB<6> timethese( 1000, { FY => 'Shuffle(\@a)', SR => '@a=sort{(-1,1) +[rand 2]}@a' }) Benchmark: timing 1000 iterations of FY, SR... FY: 63 wallclock secs (61.93 usr + 0.00 sys = 61.93 CPU) @ 16 +.15/s (n=1 000) SR: 157 wallclock secs (155.34 usr + 0.00 sys = 155.34 CPU) @ + 6.44/s ( n=1000)
    Which I read to say Fisher-Yates is almost three times faster then the one-liner.
      The one-line should be a good shuffle. However it is O(n log(n)) and Fisher-Yate's is O(n), so the longer code really is better algorithmically.
        I really should have paid more attention in Algorithm's analysis. Sigh. Maybe I'll take the next level of it when I go back to grad school. You are right. Sort is an implementation of QuickSort ( O(n log n) using lots of memory) while Fisher-Yates is O(n) with a constant defined only by how long it takes to do the rand and the swap. (We had that discussion already. <grin>) So this makes sense. And of course, this defends my suggestion that Fisher-Yates is preferable to the one-liner. Thanks tilly, maybe one day my education will start to sink in.
Re: Randomize an array
by gregorovius (Friar) on Sep 08, 2000 at 04:06 UTC
Re: Randomize an array
by tye (Cardinal) on Sep 08, 2000 at 04:31 UTC

    Back in the day, Perl didn't have its own sort and just used the qsort() provided by the local C run-time library. I seem to recall that things like sort {[-1,1]rand(2)} could cause the algorythm to take forever to finish or even dump core.

    Now a truely efficient sort should not be able to notice because it wouldn't do any more comparisons than it needed to. So perhaps the quality of Perl's sort is much higher than many of the qsort()s from back then.

            - tye (but my friends call me "Tye")
      That doesn't make much sense Tye. Qsort is, afterall, an implementation of Quicksort. Quicksort is a pretty smart algorithm for sorting an array by loading the whole thing into memory. (In fact, it has also been proven to be the most efficient algorithm for sorting a large array. You can't do better then bigO(n log n))

      We break from this post to give a brief description of quicksort for those who don't know.
      Quicksort is a recursive algorithm that picks a pivot element out of the array and then compares all the other elements to it. This results in two arrays and the pivot, where one array is all items "greater then" the pivot and the other is all items "less then" the pivot. Quicksort is then used to sort these two arrays. Since its done recursivly, it ends up with an array of pivot points in order. The code might look something like this:

      sub Qsort # WRONG! See update below! { my $pivot = shift; # picking the pivot is usually more # involved. Afterall, you don't want # an extreme, you want both arrays to # be roughly the same size. return $pivot unless @_; my( @lt, @gt ); push( $_ gt $pivot ? @gt : @lt ) for @_; return( Qsort( @lt ), $pivot, Qsort( @gt ) ); }
      (Disclaimer: I didn't test that and I havn't looked any of this up, so I could be completly wrong and that code might not work)
      We now return to the post already in progress

      So you see, the amount of time @a = sort {(-1,1)[rand 2]} @a will take is finite and well defined, regardless of the function used to determine which side of the pivot to place each element. So the only reason it would dumb core would be that it ran out of memory, but it should never become hung (unless, as I said, it ran out of memory... those sneaky recursive algorithms). The only thing that varies in a quicksort is how well the pivot is chosen. If you pick an extreme (where everything ends up on one side of the pivot) then Qsort will take much longer then if you pick the middle where everything is balanced. So the point of all this is that I doubt the implementation has gotten better, but rather the hardware (128 MB ram sure beats 2 or 4 MB).

      Update:
      So I did a little research on Quicksort and realized that it usually operates on the array in place, which mine clearly doesn't. And that it does even less work then mine did. How about:

      sub Qsort { my( $arrayRef, $lowIndex, $highIndex ) = @_; return undef if $highIndex le $lowIndex; my $pivotIndex = partition( $arrayRef, $lowIndex, $highIndex ); Qsort( $arrayRef, $lowIndex, $pivotIndex - 1 ); Qsort( $arrayRef, $pivotIndex + 1, $highIndex ); } # And then, of course, I need to provide &partition sub partition { my( $arrayRef, $lowIndex, $highIndex ) = @_; my $pivot = $highIndex + $lowIndex >> 1; my $pivotElement = $$arrayRef[$pivot]; while( $highIndex > $lowIndex ) { ++$lowIndex while $$arrayRef[$lowIndex] <= $pivotElement and $ +_[2] > $lowIndex; --$highIndex while $pivotElement <= $$arrayRef[$highIndex] and + $highIndex > $_[1]; Swap( $arrayRef, $lowIndex, $highIndex ) if $highIndex > $lowI +ndex; } if( $highIndex > $pivot ) { Swap( $arrayRef, $pivotIndex, $highIndex ); $pivot = $highIndex; } elsif( $pivot > $lowIndex ) { Swap( $arrayRef, $pivotIndex, $lowIndex ); $pivot = $lowIndex; } return $pivot; } sub Swap { ${$_[0]}->[$_[1],$_[2]] = ${$_[0]}->[$_[1],$_[2]] } # Warning, untested code...
      For more about sorting, check out these great documents I found: Here and here.
      Not to forget Mastering Algorithms with Perl from O'Reilly, which has a whole chapter devoted to sorting algorithms.

      BTW: After at least an hour of working on this post I have realized that a true quicksort would, in fact, go crazy given { (-1,1)[rand 2] } and I am now somewhat mystified as to why it converges at all. I think its time to go eat something and I will return to this later.

      Update:
      So I slept on it, and I am now re-convinced that Qsort will be finite regardless of the random nature of its partition. So I go back to the original point of this post, Tye, that Qsort should only fail if it runs out of memory for all those recursive calls, and that it should end in a finite amount of time not to exceed some constant times N squared. (N being the number of elements to be sorted.)

      My apologies to anyone that I have confused with this post. Maybe it will inspire you to learn more about sort.

        OK, I just had to go searching for some old documentation and you can see it laid out there.

        Clearly what is called qsort really need not be. For many possible reasons.

        Now more details. You are right that the average time for qsort to work is n*log(n). However qsort actually has a worst case of n^2. In fact the worst case with a naive qsort is hit on an already sorted list. Perl currently uses a qsort with pivots chosen carefully to move the worst case to something unlikely.

        You are also right that no sorting algorithm can possibly beat having an average case of O(n*log(n)). Why not? For the simple reason that there are n! possible permutations you have to deal with. In m comparisons you can only distinguish at most 2^m of them. So the number of comparisons you need will be at least log_2(n!). Well up to a constant factor that is log(n!) which is

        log(n!) = log(1*2*...*n)
                = log(1) + log(2) + ... + log(n)
        
        which is approximately the integral from 1 to n of log(x). Which in turn is n*log(n)-n+1 plus error terms from the approximation. (After a full derivation you get Stirling's Approximation.)

        Right now all that concerns us is that n*log(n) term out front. You cannot get rid of that.

        Now that said there are many sorting algorithms out there. qsort is simple and has excellent average performance, but there are others that have guaranteed performance and are order n on sorted datasets. Incidentally there is right now an interesting discussion of this on p5p, including a brand new discovery of a memory leak...

Re: Randomize an array
by PipTigger (Friar) on Sep 11, 2000 at 19:13 UTC
    Hey... I may be a silly goose but I'd linearly cast it into an associative array and then use the hash to provide O(n) performance at the expense of memory and relatively poor randomness (if there is a such thing as good randomness) for nearly sorted data (based on the internal hashing mechanism) ... like this:
    #!/usr/bin/perl -w my @unrandom = ("red", "orange", "yellow", "green", "blue", "indigo", +"violet"); my @ununrand = (); my %temphash = (); my $indX = 0; for ($indX=0;$indX<@unrandom;$indX++) { $temphash{$unrandom[$indX]} = $indX; } @ununrand = keys(%temphash); print "Unn:@unrandom\nRnd:@ununrand\n";
    Which prints:
    Unn:red orange yellow green blue indigo violet Rnd:blue orange green violet yellow red indigo
    Not great but fast anyway. Hope this is cute and werks for someone. TTFN.

    -PipTigger

    p.s. Initiate Nail Removal Immediately!

      That gave me an idea...

      values %{ map { rand() => $_ } @list };

      And for those of you who haven't applied Tilly's Patch® by hand and rebuilt your copy of perl and for whom performance is important:

      sub { my %hash; for(@list){ $hash{rand()}= $_ }; return values %hash }
              - tye (but my friends call me "Tye")
        Sick, slick...and imperfect. :-)

        When hashing random elements you expect to see a certain number of collisions in buckets. The second one in the bucket will be ordered after the first element. Therefore your algorithm is not a perfect shuffle, and the rising sequences test that I described in RE (tilly) 2: Randomize an array should be able to detect that.

        But even so, I like it. :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://31461]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (16)
As of 2014-09-02 15:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (25 votes), past polls