Here is a recent article describing a simpler algorithm for sampling derangements. I also found slides for the presentation. Since the paper is so recent, I guess this means that a small modification of Fisher-Yates is unlikely to generate derangements, since someone would have already come up with it by now. Still, their algorithm is in-place and has better expected running time than retrying Fisher-Yates until you get a derangement.

Here is a Perl implementation I whipped up. It is slightly odd because I followed their lead and used array indexing from 1.

sub rand_derangement {
my $n = shift;
return if $n == 1; ## no derangements of size 1
## precompute $D[n] == number of derangements of size n
my @D = (1,0);
push @D, $#D * ($D[-1] + $D[-2]) while $#D < $n;
my @A = (undef, 1 .. $n);
my @mark = (1, (0) x $n);
my ($i, $u) = ($n, $n);
while ($u > 1) {
if (! $mark[$i]) {
my $j = 0;
$j = 1 + int rand($i-2) while $mark[$j];
@A[$i,$j] = @A[$j,$i];
if ( rand(1) < ($u-1) * $D[$u-2] / $D[$u] ) {
$mark[$j] = 1;
$u--;
}
$u--;
}
$i--;
}
return @A[1..$n];
}