Is this a fair shuffle?

saintmike has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Is this a fair shuffle? by BrowserUk (Patriarch) on May 02, 2004 at 05:29 UTC
It is fair, and pretty slick for a pure perl implementation too, but the XS version of List::Util shuffle is 6x faster. Even if you do an in-place version, `sub sm{ my $n = @{ $_[ 0 ] }; push @{ $_[ 0 ] }, splice @{ $_[ 0 ] }, rand $n--, 1 while $n; }` [download] you gain very little. That is to say, it is a correct implementation of the Fischer-Yates shuffle, and is therefore fair. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l]
Re: Is this a fair shuffle? by Abigail-II (Bishop) on May 02, 2004 at 23:20 UTC
The algorithm is fair, but it's not an implementation of the Fisher-Yates shuffle. In fact, the solution has a pretty poor assymptotic running time: `Ω (n²)`. This is due to the splicing of the array. Splicing out a single element of an array takes, on average, time linear to the length of the array. The algorithm presented by the OP splices out elements of successively smaller arrays, but it still adds up to quadratic time. I've been pushing the Fisher-Yates shuffle instead of the splicing shuffle since 1995 or so. Since then, it has made its way into the FAQ, we have List::Util::shuffle, but despite the FAQ spelling out what's wrong with the splicing algorithm, that one just doesn't want to die. Abigail	[reply]
Re: Re: Is this a fair shuffle? by BrowserUk (Patriarch) on May 03, 2004 at 06:08 UTC
Okay Abigail, I agree with you on the O(n²) thing with regard to the performance of the implementation of the splice versions, though it didn't seem "slow" in my original tests. I ran tests with 10, 100, & 1000 elements, and as well as beating the pure Perl implementations of F_Y comfortably, nothing in the numbers actually screamed "quadratic" at me. P:\test>200083 Shuffling 10 elements Rate PPcpy PPipl SPcpy SPipl XS_FY PPcpy 38428/s -- -22% -45% -56% -78% PPipl 49550/s 29% -- -28% -43% -72% SPcpy 69263/s 80% 40% -- -20% -60% SPipl 86713/s 126% 75% 25% -- -50% XS_FY 174192/s 353% 252% 151% 101% -- Shuffling 100 elements Rate PPcpy PPipl SPcpy SPipl XS_FY PPcpy 4346/s -- -30% -42% -53% -78% PPipl 6179/s 42% -- -17% -34% -68% SPcpy 7465/s 72% 21% -- -20% -61% SPipl 9343/s 115% 51% 25% -- -52% XS_FY 19323/s 345% 213% 159% 107% -- Shuffling 1000 elements Rate PPcpy SPcpy PPipl SPipl XS_FY PPcpy 442/s -- -29% -30% -39% -77% SPcpy 625/s 41% -- -1% -14% -68% PPipl 632/s 43% 1% -- -13% -68% SPipl 726/s 64% 16% 15% -- -63% XS_FY 1957/s 342% 213% 210% 169% -- [download] However, since reading your post, I did runs of 10_000 and 100_000 and only now the difference begins to show up. Shuffling 10000 elements Rate SPipl SPcpy PPcpy PPipl XS_FY SPipl 21.2/s -- -26% -51% -66% -89% SPcpy 28.8/s 36% -- -33% -55% -85% PPcpy 43.0/s 103% 49% -- -32% -77% PPipl 63.3/s 198% 120% 47% -- -66% XS_FY 187/s 782% 550% 335% 196% -- Shuffling 100000 elements (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate SPipl SPcpy PPcpy PPipl XS_FY SPipl 0.262/s -- -43% -93% -95% -98% SPcpy 0.464/s 77% -- -88% -92% -97% PPcpy 3.90/s 1386% 741% -- -29% -74% PPipl 5.52/s 2005% 1090% 42% -- -63% XS_FY 14.8/s 5530% 3084% 279% 167% -- [download] And that shows the transition quite dramatically. In my defense, I was only really checking for it's fairness which I did using the code you'll recognise from an old post of yours. permutation \| XS_FY \| PPcpy \| PPipl \| SPcpy \| SPipl ------------------------------------------------------------------ A B C D: \| 4104 \| 4195 \| 4098 \| 4218 \| 4229 A B D C: \| 4148 \| 4212 \| 4198 \| 4170 \| 4052 A C B D: \| 4116 \| 4112 \| 4195 \| 4164 \| 4240 A C D B: \| 4194 \| 4151 \| 4212 \| 4052 \| 4219 A D B C: \| 4181 \| 4221 \| 4223 \| 4227 \| 4295 A D C B: \| 4238 \| 4140 \| 4053 \| 4195 \| 4202 B A C D: \| 4224 \| 4195 \| 4182 \| 4176 \| 4319 B A D C: \| 4172 \| 4073 \| 4075 \| 4196 \| 4128 B C A D: \| 4233 \| 4169 \| 4201 \| 4148 \| 4220 B C D A: \| 4173 \| 4220 \| 4174 \| 4204 \| 4123 B D A C: \| 4112 \| 4127 \| 4109 \| 4197 \| 4167 B D C A: \| 4080 \| 4189 \| 4139 \| 4148 \| 4116 C A B D: \| 4220 \| 4159 \| 4167 \| 4222 \| 4107 C A D B: \| 4091 \| 4212 \| 4264 \| 4126 \| 4128 C B A D: \| 4191 \| 4169 \| 4039 \| 4144 \| 4150 C B D A: \| 4178 \| 4173 \| 4236 \| 4181 \| 4118 C D A B: \| 4171 \| 4247 \| 4134 \| 4231 \| 4212 C D B A: \| 4229 \| 4142 \| 4283 \| 4205 \| 4251 D A B C: \| 4092 \| 4165 \| 4107 \| 4157 \| 4120 D A C B: \| 4089 \| 4083 \| 4278 \| 4117 \| 4026 D B A C: \| 4049 \| 4127 \| 4142 \| 4061 \| 4204 D B C A: \| 4155 \| 4144 \| 4152 \| 4085 \| 4160 D C A B: \| 4272 \| 4217 \| 4149 \| 4195 \| 4040 D C B A: \| 4288 \| 4158 \| 4190 \| 4181 \| 4174 ------------------------------------------------------------------ Std. Dev. \| 64.518 \| 44.318 \| 66.406 \| 49.629 \| 75.544 [download] The only performance issue I considered was relative to the List::Util XS implementation. It was, as expected, considerably slower and that was the main point. Now the bit where I got confused. I thought about the difference between, say the pure-perl/copying and the Splicing/copying versions, and the main difference is that the former swaps contents of elements whereas that latter swaps linked elements. I concluded that the difference between the two was an "implementation detail", in the same way as the difference between the pure-perl/copying and the XS version is an implementation detail, and therefore didn't change the nature of the basic algorithm being used, hence the addendum of "it's a Fischer Yates". I was wrong! I'm still a little bemused by why swapping pointers on the linked list, rather than swapping the contents of the elements the linked list points at, becomes quadratic, but the (newer) numbers demonstrate your point. I will have to sit down with a pen and paper and the source code of splice to understand why the costs grow that way. So, thanks for setting me straight. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l] [select]
Re: Is this a fair shuffle? by Abigail-II (Bishop) on May 03, 2004 at 09:36 UTC
Re: Is this a fair shuffle? by sgifford (Prior) on May 02, 2004 at 02:52 UTC
It looks basically OK for most purposes (although if I were using it for security purposes I'd use one of the published and widely recognized shuffling algorithms). Here's a oneliner that does the equivalent thing inplace. It walks through the array, and for the current element randomly picks one of the remaining elements and swaps it with the current element. It's not actually shorter than yours, so I'm not sure there's much use for it. `for (0..$#a) { my $r = rand(@a)-$_; @a[$_,$r]=@a[$r,$_]; };` [download]	[reply] [d/l]
Re: Is this a fair shuffle? by Gunth (Scribe) on May 02, 2004 at 02:36 UTC
It's okay. I suggest you use List::Util though. Here is the code in List::Util: `sub shuffle (@) { my @a=\(@_); my $n; my $i=@_; map { $n = rand($i--); (${$a[$n]}, $a[$n] = $a[$i])[0]; } @_; }` [download] -Will	[reply] [d/l]
Re^2: Is this a fair shuffle? by adrianh (Chancellor) on May 02, 2004 at 13:47 UTC
Here is the code in List::Util Just as a point of information List::Util will use an XS version if possible, which is considerably faster than the Perl one listed above. Really there isn't any reason not to use List::Util if its available, and it's been core since 5.007003.	[reply]


No such thing as a small change
	PerlMonks