Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: Random Derangement Of An Array

by dcturner (Initiate)
on Dec 04, 2008 at 19:54 UTC ( #728077=note: print w/ replies, xml ) Need Help??


in reply to Re: Random Derangement Of An Array
in thread Random Derangement Of An Array

Here's the approach I wanted to code up, but I've been sufficiently distracted.

After much brain-racking I have been unable to come up with a less silly approach than simply encoding the combinatorial proof that d_(n+1) = n (d_n + d_(n-1)) in Perl, which was essentially your idea.

All of the other ideas in this thread so far produce a biased distribution; the 'obvious' modification to the Fisher-Yates shuffle algorithm looked promising but misses some derangements entirely.

It might be straightforward to do it purely iteratively too, but my brain works recursively so my programs do too!

#!/usr/bin/perl use strict; sub d_l_rec { # Calculates the pair (d_n, d_{n-1}) # # Why calculate the pair? It's O(n) instead of O(2^n) # for the 'obvious' recursive algorithm. # # Why recursive? No need to do it iteratively. my ($n) = @_; return 1 if $n < 1; return (0, 1) if $n == 1; my ($d1, $d2) = d_l_rec($n-1); my $d = ($n-1) * ($d1 + $d2); return ($d, $d1); } sub random_local_derangement { # Returns a randomly-chosen local derangement of # (0..($n-1)). A local derangement is a derangement # *except* that the last place may be a fixed point. # # It's 'local' in the sense that, given $n people and a # hat of $n tickets you can generate a local derangement # by each person in turn pulling tickets out of the hat # until they get one that isn't theirs, ensuring that # (local to their draw) the permutation looks like it's # going to be a derangement. This is how 'Secret Santa' # derangements often seem to be organised, and this # process sometimes leaves the last person with their # own ticket. # # Unfortunately the 'Secret Santa' approach definitely # does not give uniform probabilities to each outcome. # This function, on the other hand, does. [At least, # it's definitely uniform for $n <= 12 and merely a very # good approximation for larger $n.] my ($n) = @_; if ($n == 0) { return []; } my ($i, $threshold); # A local $n-derangement is either a full 'total' # $n-derangement or else it is a ($n-1)-derangement with # a fixed point added at the end. We must choose between # these options with appropriate probability weighting. # Note that this means that l_k = d_k + d_{k-1} where # l_k is the number of local k-derangements and d_k is # the number of total k-derangements. # # Note that d_{12} < 2^31 < d_{13} so we have to be # careful of overflow for $n > 13. Fortunately there's a # good approximation that can be used. if ($n <= 12) { # Calculate d_{$n} and d_{$n-1} and a random value # $i in [0, d_{$n} + d_{$n-1}) to decide which of # the two options to use. my ($dn, $dn1) = d_l_rec($n); $i = int(rand($dn+$dn1)); $threshold = $dn1; } else { # If $n is large then d_$n/d_{$n-1} is very close to # $n. Therefore it'll do to pick a random $i in the # range [0, $n] and see if it is 0 or not. $i = int(rand($n+1)); $threshold = 1; # I think this is ok, but it just might contain an # off-by-one error. The upshot of such an error # would be a degree of bias in the results that is # going to be hard to detect - you may have to run # it literally trillions of times to pick up a # statistically significant result. } if ($i < $threshold) { # Case 1 - pick a properly local derangement my $d = random_derangement($n-1); push @$d, $n-1; return $d; } else { # Case 2 - pick a total derangement my $d = random_derangement($n); return $d; } } sub random_derangement { # Returns a randomly-chosen (total) derangement of # (0..($n-1)), uniformly-chosen amongst all possible # derangements. my ($n) = @_; if ($n == 0) { return []; } # There are (n-1) l_{n-1} of them, so pick a (uniformly) # random local ($n-1)-derangement and a random $m in the # range [0, $n-1). my $ld = random_local_derangement($n-1); my $m = int(rand($n-1)); # If L_k is the set of all local k-derangements and D_k # is the set of all total k-derangements then the code # below encodes the proof that (n-1) l_{n-1} = d_n in a # bijection between [0, $n-1) x L_{n-1} and D_{n}. # # Since the pair ($m, $ld) are chosen uniformly, this # shows that the resulting derangement is also uniformly # chosen. if ($n-2 == $ld->[$n-2]) { # $ld is properly local. Therefore the desired # derangement swaps the $m'th and last places and # uses $ld to derange the other places. my $j = $n-1; while ($j--) { my $k = $j < $m ? $j : $j-1; $ld->[$j] = $ld->[$k] < $m ? $ld->[$k] : $ld->[$k]+1; } $ld->[$n-1] = $m; $ld->[$m] = $n-1; return $ld; } else { # $ld is total. Therefore put the $m'th entry at the # end and put $n-1 in the $m'th place. $ld->[$n-1] = $ld->[$m]; $ld->[$m] = $n-1; return $ld; } } sub check_derangement { # Check that we have really generated a derangement my ($n, $d) = @_; my $s = join ', ', @$d; die "Wrong length: $s ($n)" unless ($n == @$d); for (my $i = 0; $i < $n; $i++) { die "Not a derangement: $s" if ($i == $d->[$i]); die "Illegal value: $d->[$i] in $s" if ($d->[$i] < 0 || $d->[$i] >= $n); } eval { my @check_unique = sort { $a <=> $b || undef } @$d; }; die "Uniqueness check failed: $s" if $@; } my $n = $ARGV[0]; my %f = (); my $c = 0; for (my $i = 0; $i < 1e6; $i++) { my $d = random_derangement($n); check_derangement($n, $d); my $s = join ',', @$d; $f{$s} += 1; $c += 1; } for my $key (sort {$f{$a} <=> $f{$b}} keys %f) { printf "%s: %0.2f%% (expected %+0.2f%%)\n", $key, 100.0*($f{$key}/ +$c), 100.0*(($f{$key}/$c)-(1.0/(scalar keys %f))); } printf "Total %d (%d runs)\n", (scalar keys %f), $c;

This produces output like the following

$ ./derangements.pl 4 2,3,0,1: 11.08% (expected -0.03%) 1,0,3,2: 11.09% (expected -0.02%) 2,3,1,0: 11.10% (expected -0.01%) 1,3,0,2: 11.11% (expected -0.01%) 2,0,3,1: 11.11% (expected -0.00%) 1,2,3,0: 11.12% (expected +0.01%) 3,0,1,2: 11.12% (expected +0.01%) 3,2,1,0: 11.13% (expected +0.02%) 3,2,0,1: 11.14% (expected +0.03%) Total 9 (1000000 runs)

Clearly that's not far off uniform!


Comment on Re^2: Random Derangement Of An Array
Select or Download Code
Re^3: Random Derangement Of An Array
by blokhead (Monsignor) on Dec 07, 2008 at 20:38 UTC
    Here is a recent article describing a simpler algorithm for sampling derangements. I also found slides for the presentation. Since the paper is so recent, I guess this means that a small modification of Fisher-Yates is unlikely to generate derangements, since someone would have already come up with it by now. Still, their algorithm is in-place and has better expected running time than retrying Fisher-Yates until you get a derangement.

    Here is a Perl implementation I whipped up. It is slightly odd because I followed their lead and used array indexing from 1.

    sub rand_derangement { my $n = shift; return if $n == 1; ## no derangements of size 1 ## precompute $D[n] == number of derangements of size n my @D = (1,0); push @D, $#D * ($D[-1] + $D[-2]) while $#D < $n; my @A = (undef, 1 .. $n); my @mark = (1, (0) x $n); my ($i, $u) = ($n, $n); while ($u > 1) { if (! $mark[$i]) { my $j = 0; $j = 1 + int rand($i-2) while $mark[$j]; @A[$i,$j] = @A[$j,$i]; if ( rand(1) < ($u-1) * $D[$u-2] / $D[$u] ) { $mark[$j] = 1; $u--; } $u--; } $i--; } return @A[1..$n]; }

    blokhead

      Here is a recent article describing a simpler algorithm for sampling derangements...

      Ah yes, that's nice. I think there is a slight buglet in the given code:

      $j = 1 + int rand($i-2) while $mark[$j];

      should read

      $j = 1 + int rand($i-1) while $mark[$j];

      should it not? Also it could hit an overflow bug for $n > 12, because d_n gets quite large quite quickly. Even with 64-bit ints you overflow for $n > 20. The same approximation works for your method as for mine: for $n > 12 the value (n-1)d_{n-2} / d_n is very close to 1/n.

      The iterative version of the recursive program that I wrote above is as follows. It's a bit convoluted because the recursion d_n = (n-1) (d_{n-1} + d_{n-2}) is second-order, so you have to work a bit harder than for a first-order recursion. It's also in-place and requires only a constant amount of auxiliary storage. Plus it certainly terminates, whereas the rejection method merely almost certainly terminates :) On the minus side, it is a bit quadratic (although with low probability). Side-by-side, they run pretty similarly it seems.

      sub random_derangement { my ($n) = @_; my @d = (1, 0, 1, 2, 9, 44, 265, 1854, 14833, 133496, 1334961, 14684570, 176214841); my @t; my $i = $n; while ($i--) { my $m = int(rand($i)); $t[$i] = $m; if ($i <= 12) { $i -= 1 if (int(rand($d[$i]+$d[$i-1])) < $d[$i-1]); } else { $i -= 1 if (int(rand($i+1)) == 0); } } for ($i = 0; $i < $n; $i++) { if (defined($t[$i])) { my $m = $t[$i]; $t[$i] = $t[$m]; $t[$m] = $i; } else { my $j = $i+1; my $m = $t[$i+1]; while ($j--) { my $k = $j < $m ? $j : $j-1; $t[$j] = $t[$k] < $m ? $t[$k] : $t[$k]+1; } $t[$i+1] = $m; $t[$m] = $i+1; $i += 1; } } return \@t; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://728077]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2014-09-23 06:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (210 votes), past polls