Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Is this a fair shuffle?

by saintmike (Vicar)
on May 02, 2004 at 02:03 UTC ( #349722=perlquestion: print w/ replies, xml ) Need Help??
saintmike has asked for the wisdom of the Perl Monks concerning the following question:

Hey monks,

is this a fair way to shuffle an array:

my @a = (1..10); my @b; push @b, splice @a, rand @a, 1 while @a;
Algorithm::Numerical::Shuffle's or List::Util's shuffle() are ok, but if it can be done in a one liner shorter than the one shown in How do I shuffle an array or in the FAQ, that'd be preferable.

And, does any of you perlgolfer-monks have an idea on how to transform the snippet above into one that's working in-place?

Comment on Is this a fair shuffle?
Select or Download Code
Re: Is this a fair shuffle?
by Gunth (Scribe) on May 02, 2004 at 02:36 UTC
    It's okay. I suggest you use List::Util though. Here is the code in List::Util:
    sub shuffle (@) { my @a=\(@_); my $n; my $i=@_; map { $n = rand($i--); (${$a[$n]}, $a[$n] = $a[$i])[0]; } @_; }
    -Will
      Here is the code in List::Util

      Just as a point of information List::Util will use an XS version if possible, which is considerably faster than the Perl one listed above.

      Really there isn't any reason not to use List::Util if its available, and it's been core since 5.007003.

Re: Is this a fair shuffle?
by sgifford (Prior) on May 02, 2004 at 02:52 UTC

    It looks basically OK for most purposes (although if I were using it for security purposes I'd use one of the published and widely recognized shuffling algorithms).

    Here's a oneliner that does the equivalent thing inplace. It walks through the array, and for the current element randomly picks one of the remaining elements and swaps it with the current element. It's not actually shorter than yours, so I'm not sure there's much use for it.

    for (0..$#a) { my $r = rand(@a)-$_; @a[$_,$r]=@a[$r,$_]; };
Re: Is this a fair shuffle?
by BrowserUk (Pope) on May 02, 2004 at 05:29 UTC

    It is fair*, and pretty slick for a pure perl implementation too, but the XS version of List::Util shuffle is 6x faster.

    Even if you do an in-place version,

    sub sm{ my $n = @{ $_[ 0 ] }; push @{ $_[ 0 ] }, splice @{ $_[ 0 ] }, rand $n--, 1 while $n; }

    you gain very little.

    * That is to say, it is a correct implementation of the Fischer-Yates shuffle, and is therefore fair.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
      The algorithm is fair, but it's not an implementation of the Fisher-Yates shuffle. In fact, the solution has a pretty poor assymptotic running time: Ω (n²). This is due to the splicing of the array. Splicing out a single element of an array takes, on average, time linear to the length of the array. The algorithm presented by the OP splices out elements of successively smaller arrays, but it still adds up to quadratic time.

      I've been pushing the Fisher-Yates shuffle instead of the splicing shuffle since 1995 or so. Since then, it has made its way into the FAQ, we have List::Util::shuffle, but despite the FAQ spelling out what's wrong with the splicing algorithm, that one just doesn't want to die.

      Abigail

        Okay Abigail, I agree with you on the O(nē) thing with regard to the performance of the implementation of the splice versions, though it didn't seem "slow" in my original tests.

        I ran tests with 10, 100, & 1000 elements, and as well as beating the pure Perl implementations of F_Y comfortably, nothing in the numbers actually screamed "quadratic" at me.

        P:\test>200083 Shuffling 10 elements Rate PPcpy PPipl SPcpy SPipl XS_FY PPcpy 38428/s -- -22% -45% -56% -78% PPipl 49550/s 29% -- -28% -43% -72% SPcpy 69263/s 80% 40% -- -20% -60% SPipl 86713/s 126% 75% 25% -- -50% XS_FY 174192/s 353% 252% 151% 101% -- Shuffling 100 elements Rate PPcpy PPipl SPcpy SPipl XS_FY PPcpy 4346/s -- -30% -42% -53% -78% PPipl 6179/s 42% -- -17% -34% -68% SPcpy 7465/s 72% 21% -- -20% -61% SPipl 9343/s 115% 51% 25% -- -52% XS_FY 19323/s 345% 213% 159% 107% -- Shuffling 1000 elements Rate PPcpy SPcpy PPipl SPipl XS_FY PPcpy 442/s -- -29% -30% -39% -77% SPcpy 625/s 41% -- -1% -14% -68% PPipl 632/s 43% 1% -- -13% -68% SPipl 726/s 64% 16% 15% -- -63% XS_FY 1957/s 342% 213% 210% 169% --

        However, since reading your post, I did runs of 10_000 and 100_000 and only now the difference begins to show up.

        Shuffling 10000 elements Rate SPipl SPcpy PPcpy PPipl XS_FY SPipl 21.2/s -- -26% -51% -66% -89% SPcpy 28.8/s 36% -- -33% -55% -85% PPcpy 43.0/s 103% 49% -- -32% -77% PPipl 63.3/s 198% 120% 47% -- -66% XS_FY 187/s 782% 550% 335% 196% -- Shuffling 100000 elements (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate SPipl SPcpy PPcpy PPipl XS_FY SPipl 0.262/s -- -43% -93% -95% -98% SPcpy 0.464/s 77% -- -88% -92% -97% PPcpy 3.90/s 1386% 741% -- -29% -74% PPipl 5.52/s 2005% 1090% 42% -- -63% XS_FY 14.8/s 5530% 3084% 279% 167% --

        And that shows the transition quite dramatically. In my defense, I was only really checking for it's fairness which I did using the code you'll recognise from an old post of yours.

        permutation | XS_FY | PPcpy | PPipl | SPcpy | SPipl ------------------------------------------------------------------ A B C D: | 4104 | 4195 | 4098 | 4218 | 4229 A B D C: | 4148 | 4212 | 4198 | 4170 | 4052 A C B D: | 4116 | 4112 | 4195 | 4164 | 4240 A C D B: | 4194 | 4151 | 4212 | 4052 | 4219 A D B C: | 4181 | 4221 | 4223 | 4227 | 4295 A D C B: | 4238 | 4140 | 4053 | 4195 | 4202 B A C D: | 4224 | 4195 | 4182 | 4176 | 4319 B A D C: | 4172 | 4073 | 4075 | 4196 | 4128 B C A D: | 4233 | 4169 | 4201 | 4148 | 4220 B C D A: | 4173 | 4220 | 4174 | 4204 | 4123 B D A C: | 4112 | 4127 | 4109 | 4197 | 4167 B D C A: | 4080 | 4189 | 4139 | 4148 | 4116 C A B D: | 4220 | 4159 | 4167 | 4222 | 4107 C A D B: | 4091 | 4212 | 4264 | 4126 | 4128 C B A D: | 4191 | 4169 | 4039 | 4144 | 4150 C B D A: | 4178 | 4173 | 4236 | 4181 | 4118 C D A B: | 4171 | 4247 | 4134 | 4231 | 4212 C D B A: | 4229 | 4142 | 4283 | 4205 | 4251 D A B C: | 4092 | 4165 | 4107 | 4157 | 4120 D A C B: | 4089 | 4083 | 4278 | 4117 | 4026 D B A C: | 4049 | 4127 | 4142 | 4061 | 4204 D B C A: | 4155 | 4144 | 4152 | 4085 | 4160 D C A B: | 4272 | 4217 | 4149 | 4195 | 4040 D C B A: | 4288 | 4158 | 4190 | 4181 | 4174 ------------------------------------------------------------------ Std. Dev. | 64.518 | 44.318 | 66.406 | 49.629 | 75.544

        The only performance issue I considered was relative to the List::Util XS implementation. It was, as expected, considerably slower and that was the main point.

        Now the bit where I got confused. I thought about the difference between, say the pure-perl/copying and the Splicing/copying versions, and the main difference is that the former swaps contents of elements whereas that latter swaps linked elements. I concluded that the difference between the two was an "implementation detail", in the same way as the difference between the pure-perl/copying and the XS version is an implementation detail, and therefore didn't change the nature of the basic algorithm being used, hence the addendum of "it's a Fischer Yates". I was wrong!

        I'm still a little bemused by why swapping pointers on the linked list, rather than swapping the contents of the elements the linked list points at, becomes quadratic, but the (newer) numbers demonstrate your point. I will have to sit down with a pen and paper and the source code of splice to understand why the costs grow that way.

        So, thanks for setting me straight.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://349722]
Approved by b10m
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2014-07-26 19:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (178 votes), past polls