Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^2: Random Derangement Of An Array

by dcturner (Initiate)
on Dec 04, 2008 at 19:54 UTC ( #728077=note: print w/ replies, xml ) Need Help??


in reply to Re: Random Derangement Of An Array
in thread Random Derangement Of An Array

Here's the approach I wanted to code up, but I've been sufficiently distracted.

After much brain-racking I have been unable to come up with a less silly approach than simply encoding the combinatorial proof that d_(n+1) = n (d_n + d_(n-1)) in Perl, which was essentially your idea.

All of the other ideas in this thread so far produce a biased distribution; the 'obvious' modification to the Fisher-Yates shuffle algorithm looked promising but misses some derangements entirely.

It might be straightforward to do it purely iteratively too, but my brain works recursively so my programs do too!

#!/usr/bin/perl use strict; sub d_l_rec { # Calculates the pair (d_n, d_{n-1}) # # Why calculate the pair? It's O(n) instead of O(2^n) # for the 'obvious' recursive algorithm. # # Why recursive? No need to do it iteratively. my ($n) = @_; return 1 if $n < 1; return (0, 1) if $n == 1; my ($d1, $d2) = d_l_rec($n-1); my $d = ($n-1) * ($d1 + $d2); return ($d, $d1); } sub random_local_derangement { # Returns a randomly-chosen local derangement of # (0..($n-1)). A local derangement is a derangement # *except* that the last place may be a fixed point. # # It's 'local' in the sense that, given $n people and a # hat of $n tickets you can generate a local derangement # by each person in turn pulling tickets out of the hat # until they get one that isn't theirs, ensuring that # (local to their draw) the permutation looks like it's # going to be a derangement. This is how 'Secret Santa' # derangements often seem to be organised, and this # process sometimes leaves the last person with their # own ticket. # # Unfortunately the 'Secret Santa' approach definitely # does not give uniform probabilities to each outcome. # This function, on the other hand, does. [At least, # it's definitely uniform for $n <= 12 and merely a very # good approximation for larger $n.] my ($n) = @_; if ($n == 0) { return []; } my ($i, $threshold); # A local $n-derangement is either a full 'total' # $n-derangement or else it is a ($n-1)-derangement with # a fixed point added at the end. We must choose between # these options with appropriate probability weighting. # Note that this means that l_k = d_k + d_{k-1} where # l_k is the number of local k-derangements and d_k is # the number of total k-derangements. # # Note that d_{12} < 2^31 < d_{13} so we have to be # careful of overflow for $n > 13. Fortunately there's a # good approximation that can be used. if ($n <= 12) { # Calculate d_{$n} and d_{$n-1} and a random value # $i in [0, d_{$n} + d_{$n-1}) to decide which of # the two options to use. my ($dn, $dn1) = d_l_rec($n); $i = int(rand($dn+$dn1)); $threshold = $dn1; } else { # If $n is large then d_$n/d_{$n-1} is very close to # $n. Therefore it'll do to pick a random $i in the # range [0, $n] and see if it is 0 or not. $i = int(rand($n+1)); $threshold = 1; # I think this is ok, but it just might contain an # off-by-one error. The upshot of such an error # would be a degree of bias in the results that is # going to be hard to detect - you may have to run # it literally trillions of times to pick up a # statistically significant result. } if ($i < $threshold) { # Case 1 - pick a properly local derangement my $d = random_derangement($n-1); push @$d, $n-1; return $d; } else { # Case 2 - pick a total derangement my $d = random_derangement($n); return $d; } } sub random_derangement { # Returns a randomly-chosen (total) derangement of # (0..($n-1)), uniformly-chosen amongst all possible # derangements. my ($n) = @_; if ($n == 0) { return []; } # There are (n-1) l_{n-1} of them, so pick a (uniformly) # random local ($n-1)-derangement and a random $m in the # range [0, $n-1). my $ld = random_local_derangement($n-1); my $m = int(rand($n-1)); # If L_k is the set of all local k-derangements and D_k # is the set of all total k-derangements then the code # below encodes the proof that (n-1) l_{n-1} = d_n in a # bijection between [0, $n-1) x L_{n-1} and D_{n}. # # Since the pair ($m, $ld) are chosen uniformly, this # shows that the resulting derangement is also uniformly # chosen. if ($n-2 == $ld->[$n-2]) { # $ld is properly local. Therefore the desired # derangement swaps the $m'th and last places and # uses $ld to derange the other places. my $j = $n-1; while ($j--) { my $k = $j < $m ? $j : $j-1; $ld->[$j] = $ld->[$k] < $m ? $ld->[$k] : $ld->[$k]+1; } $ld->[$n-1] = $m; $ld->[$m] = $n-1; return $ld; } else { # $ld is total. Therefore put the $m'th entry at the # end and put $n-1 in the $m'th place. $ld->[$n-1] = $ld->[$m]; $ld->[$m] = $n-1; return $ld; } } sub check_derangement { # Check that we have really generated a derangement my ($n, $d) = @_; my $s = join ', ', @$d; die "Wrong length: $s ($n)" unless ($n == @$d); for (my $i = 0; $i < $n; $i++) { die "Not a derangement: $s" if ($i == $d->[$i]); die "Illegal value: $d->[$i] in $s" if ($d->[$i] < 0 || $d->[$i] >= $n); } eval { my @check_unique = sort { $a <=> $b || undef } @$d; }; die "Uniqueness check failed: $s" if $@; } my $n = $ARGV[0]; my %f = (); my $c = 0; for (my $i = 0; $i < 1e6; $i++) { my $d = random_derangement($n); check_derangement($n, $d); my $s = join ',', @$d; $f{$s} += 1; $c += 1; } for my $key (sort {$f{$a} <=> $f{$b}} keys %f) { printf "%s: %0.2f%% (expected %+0.2f%%)\n", $key, 100.0*($f{$key}/ +$c), 100.0*(($f{$key}/$c)-(1.0/(scalar keys %f))); } printf "Total %d (%d runs)\n", (scalar keys %f), $c;

This produces output like the following

$ ./derangements.pl 4 2,3,0,1: 11.08% (expected -0.03%) 1,0,3,2: 11.09% (expected -0.02%) 2,3,1,0: 11.10% (expected -0.01%) 1,3,0,2: 11.11% (expected -0.01%) 2,0,3,1: 11.11% (expected -0.00%) 1,2,3,0: 11.12% (expected +0.01%) 3,0,1,2: 11.12% (expected +0.01%) 3,2,1,0: 11.13% (expected +0.02%) 3,2,0,1: 11.14% (expected +0.03%) Total 9 (1000000 runs)

Clearly that's not far off uniform!


Comment on Re^2: Random Derangement Of An Array
Select or Download Code
Re^3: Random Derangement Of An Array
by blokhead (Monsignor) on Dec 07, 2008 at 20:38 UTC
    Here is a recent article describing a simpler algorithm for sampling derangements. I also found slides for the presentation. Since the paper is so recent, I guess this means that a small modification of Fisher-Yates is unlikely to generate derangements, since someone would have already come up with it by now. Still, their algorithm is in-place and has better expected running time than retrying Fisher-Yates until you get a derangement.

    Here is a Perl implementation I whipped up. It is slightly odd because I followed their lead and used array indexing from 1.

    sub rand_derangement { my $n = shift; return if $n == 1; ## no derangements of size 1 ## precompute $D[n] == number of derangements of size n my @D = (1,0); push @D, $#D * ($D[-1] + $D[-2]) while $#D < $n; my @A = (undef, 1 .. $n); my @mark = (1, (0) x $n); my ($i, $u) = ($n, $n); while ($u > 1) { if (! $mark[$i]) { my $j = 0; $j = 1 + int rand($i-2) while $mark[$j]; @A[$i,$j] = @A[$j,$i]; if ( rand(1) < ($u-1) * $D[$u-2] / $D[$u] ) { $mark[$j] = 1; $u--; } $u--; } $i--; } return @A[1..$n]; }

    blokhead

      Here is a recent article describing a simpler algorithm for sampling derangements...

      Ah yes, that's nice. I think there is a slight buglet in the given code:

      $j = 1 + int rand($i-2) while $mark[$j];

      should read

      $j = 1 + int rand($i-1) while $mark[$j];

      should it not? Also it could hit an overflow bug for $n > 12, because d_n gets quite large quite quickly. Even with 64-bit ints you overflow for $n > 20. The same approximation works for your method as for mine: for $n > 12 the value (n-1)d_{n-2} / d_n is very close to 1/n.

      The iterative version of the recursive program that I wrote above is as follows. It's a bit convoluted because the recursion d_n = (n-1) (d_{n-1} + d_{n-2}) is second-order, so you have to work a bit harder than for a first-order recursion. It's also in-place and requires only a constant amount of auxiliary storage. Plus it certainly terminates, whereas the rejection method merely almost certainly terminates :) On the minus side, it is a bit quadratic (although with low probability). Side-by-side, they run pretty similarly it seems.

      sub random_derangement { my ($n) = @_; my @d = (1, 0, 1, 2, 9, 44, 265, 1854, 14833, 133496, 1334961, 14684570, 176214841); my @t; my $i = $n; while ($i--) { my $m = int(rand($i)); $t[$i] = $m; if ($i <= 12) { $i -= 1 if (int(rand($d[$i]+$d[$i-1])) < $d[$i-1]); } else { $i -= 1 if (int(rand($i+1)) == 0); } } for ($i = 0; $i < $n; $i++) { if (defined($t[$i])) { my $m = $t[$i]; $t[$i] = $t[$m]; $t[$m] = $i; } else { my $j = $i+1; my $m = $t[$i+1]; while ($j--) { my $k = $j < $m ? $j : $j-1; $t[$j] = $t[$k] < $m ? $t[$k] : $t[$k]+1; } $t[$i+1] = $m; $t[$m] = $i+1; $i += 1; } } return \@t; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://728077]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (11)
As of 2015-07-06 11:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (73 votes), past polls