|
in reply to Re^2: Shuffling CODONS in thread Shuffling CODONS
But when that biased PRNG used within a standard Fisher Yates shuffle, the shuffles are fair, no matter how many times you run it
I am not convinced.
A statistical test for assessing statistical fairness is chi-square: throw a dice a million times and feed chi-square with the counts for each outcome: '1' => ..., '2' => ... , ... '6' => .... It will tell you whether it thinks the dice is fair or not.
So I run your test code with 3 rand-subs:
use Math::Random::MT;
my $mt = Math::Random::MT->new();
sub badRand($) { rand() < 0.5 ? 0 : rand( $_[0] ) }
sub goodRand($) { rand($_[0]) }
sub bestRand($) { $mt->rand($_[0]) }
Then I run the chi-squared test (Statistics::ChiSquare) on the counts of the shuffle (re: the output of your test code).
This is the result:
bad_shuffle : There's a <1% chance that this data is random.
good_shuffle : There's a >25% chance, and a <50% chance, that this data is random.
best_shuffle : There's a >25% chance, and a <50% chance, that this data is random.
or
bad_shuffle : There's a <1% chance that this data is random.
best_shuffle : There's a >50% chance, and a <75% chance, that this data is random.
good_shuffle : There's a >10% chance, and a <25% chance, that this data is random.
or
bad_shuffle : There's a <1% chance that this data is random.
best_shuffle : There's a >25% chance, and a <50% chance, that this data is random.
good_shuffle : There's a >50% chance, and a <75% chance, that this data is random.
The above test consistently rates "bad_shuffle" based on badRand() as producing data with less than 1% chance of being random. It is undecided, let's say, which is best for this particular shuffle: Mersenne or Perl.
The results your test code produced may look fair but they aint according to this particular test - it very much depends on use-case (see end of post). Although you may be right with your hunch that Mersenne may not offer anything more than perl's standard rand() for FY shuffle but hey let's not exaggerate with badRand()! :)
This is my test code:
#!/usr/bin/env perl
use strict;
use Statistics::ChiSquare;
use Math::Random::MT;
my $mt = Math::Random::MT->new();
my $NUMTESTS = 100000;
sub badRand($) { rand() < 0.5 ? 0 : rand( $_[0] ) }
sub goodRand($) { rand($_[0]) }
sub bestRand($) { $mt->rand($_[0]) }
my %tests = (
'bad_shuffle' => {
'randsub' => \&badRand,
'rands' => undef
},
'good_shuffle' => {
'randsub' => \&goodRand,
'rands' => undef
},
'best_shuffle' => {
'randsub' => \&bestRand,
'rands' => undef
},
);
my @vals = ( 1..4 );
foreach (keys %tests){
my $asub = $tests{$_}->{'randsub'};
$tests{$_}{'results'} = do_a_shuffle($asub, @vals);
$tests{$_}{'chi'} = Statistics::ChiSquare::chisquare(values %
+{$tests{$_}{'results'}});
print $_ . " : " . $tests{$_}{'chi'}."\n";
}
# &randsub, @choices
sub shuffle {
my $randsub = shift;
# rest of params are the values/choices to shuffle
$a = $_ + $randsub->( @_ - $_ ),
$b = $_[$_],
$_[$_] = $_[$a],
$_[$a] = $b
for 0 .. $#_;
return @_;
}
# &randsub, @choices
sub do_a_shuffle {
my %tests;
for(1..$NUMTESTS){
++$tests{ join '', shuffle(@_) };
}
return \%tests;
}
__END__
sidenote:
A particular shuffle may be appropriate for a particular situation.
Such as?
- poker and co where patterns in the shuffled cards may influence the game. The shuffled cards must be tested for the number of patterns they contain.
- medical trials : a list of "same-category" (e.g. all healthy) patients must be shuffled in order to then separate them to different groups for trial treatment.
- related to #2 : I have a group with certain average properties. Then I group them randomly, after a shuffle, does each group has on average the same properties? Or have I grouped together the tall boys and put in another group the short boys, because my shuffle algorithm was not suitable?
Re^4: Shuffling CODONS
by BrowserUk (Patriarch) on Jun 10, 2018 at 08:48 UTC
|
Okay. I ran your code again and got:
C:\test>junk123
best_shuffle : There's a <1% chance that this data is random.
good_shuffle : There's a >50% chance, and a <75% chance, that this dat
+a is random.
bad_shuffle : There's a >5% chance, and a <10% chance, that this data
+is random.
That's not right!
So, I though about my example code and looked at what it was intended to demonstrate.
That despite the use of a completely bogus rand() function, a Fisher-Yates shuffle would still operate; and produce results:
- That all possible shuffles of the data were being produced.
I chose to shuffle 4 values because the 24 possible results fit on a screen and are simple to verify manually.
- That they were produced with (approximately) the same frequency.
Ie. The number of times each possible shuffle was produced were approximately equal and approximately 1/24th of the total runs.
In that respect, it served its purpose.
But, if you are going to formally test a shuffle, using only 4 value arrays and 1e6 iterations probably isn't the ideal scenario to test.
To that end I tweaked your code to allow me to adjust both parameters from the command line:
our $NUMTESTS //= 1e6;
our $ASIZE //= 4;
...
my @vals = ( 1..$ASIZE );
And, given that Chi2 is generally used to determine whether a (smallish) sample is representative of a large and thus unknown population,
I tried using a (moderately) larger array:
C:\test>junk123 -ASIZE=40 -N=1e5
best_shuffle : I can't handle 100000 choices without a better table.
good_shuffle : I can't handle 100000 choices without a better table.
bad_shuffle : I can't handle 100000 choices without a better table.
C:\test>junk123 -ASIZE=40 -N=1e4
best_shuffle : I can't handle 10000 choices without a better table.
good_shuffle : I can't handle 10000 choices without a better table.
C:\test>junk123 -ASIZE=40 -N=1e3
best_shuffle : I can't handle 1000 choices without a better table.
good_shuffle : I can't handle 1000 choices without a better table.
bad_shuffle : I can't handle 1000 choices without a better table.
C:\test>junk123 -ASIZE=40 -N=1e2
best_shuffle : There's a >99.5% chance, and a <100% chance, that this
+data is random.
good_shuffle : There's a >99.5% chance, and a <100% chance, that this
+data is random.
bad_shuffle : There's a >99.5% chance, and a <100% chance, that this d
+ata is random.
And once I found a sample size of that larger array that the module could handle, did a few "identical" runs:
C:\test>junk123 -ASIZE=40 -N=1e2
best_shuffle : There's a <1% chance that this data is random.
good_shuffle : There's a >50% chance, and a <75% chance, that this dat
+a is random.
bad_shuffle : There's a >5% chance, and a <10% chance, that this data
+is random.
C:\test>junk123 -ASIZE=40 -N=1e2
best_shuffle : There's a >50% chance, and a <75% chance, that this dat
+a is random.
good_shuffle : There's a >50% chance, and a <75% chance, that this dat
+a is random.
bad_shuffle : There's a <1% chance that this data is random.
C:\test>junk123 -ASIZE=40 -N=1e2
best_shuffle : There's a >50% chance, and a <75% chance, that this dat
+a is random.
good_shuffle : There's a >50% chance, and a <75% chance, that this dat
+a is random.
bad_shuffle : There's a >10% chance, and a <25% chance, that this data
+ is random.
Hm. Not exactly confidence inspiring.
Let's try some middle ground:
C:\test>junk123 -ASIZE=11 -N=1e5
best_shuffle : There's a >75% chance, and a <90% chance, that this dat
+a is random.
good_shuffle : There's a >75% chance, and a <90% chance, that this dat
+a is random.
bad_shuffle : There's a <1% chance that this data is random.
C:\test>junk123 -ASIZE=11 -N=1e5
best_shuffle : There's a >25% chance, and a <50% chance, that this dat
+a is random.
good_shuffle : There's a >1% chance, and a <5% chance, that this data
+is random.
bad_shuffle : There's a <1% chance that this data is random.
C:\test>junk123 -ASIZE=11 -N=1e4
best_shuffle : There's a >75% chance, and a <90% chance, that this dat
+a is random.
good_shuffle : There's a >1% chance, and a <5% chance, that this data
+is random.
bad_shuffle : There's a >75% chance, and a <90% chance, that this data
+ is random.
C:\test>junk123 -ASIZE=11 -N=1e4
best_shuffle : There's a >50% chance, and a <75% chance, that this dat
+a is random.
good_shuffle : There's a >50% chance, and a <75% chance, that this dat
+a is random.
bad_shuffle : There's a <1% chance that this data is random.
C:\test>junk123 -ASIZE=11 -N=1e4
best_shuffle : There's a >10% chance, and a <25% chance, that this dat
+a is random.
good_shuffle : There's a >5% chance, and a <10% chance, that this data
+ is random.
bad_shuffle : There's a <1% chance that this data is random.
C:\test>junk123 -ASIZE=11 -N=1e2
best_shuffle : There's a >10% chance, and a <25% chance, that this dat
+a is random.
good_shuffle : There's a >75% chance, and a <90% chance, that this dat
+a is random.
bad_shuffle : There's a <1% chance that this data is random.
C:\test>junk123 -ASIZE=11 -N=1e2
best_shuffle : There's a >1% chance, and a <5% chance, that this data
+is random.
good_shuffle : There's a >90% chance, and a <95% chance, that this dat
+a is random.
bad_shuffle : There's a >1% chance, and a <5% chance, that this data i
+s random.
Sure, it's guessing that the known, deliberately really bad rand is producing poor results most of the time,
but its also making the same guess about the known good rand with surprisingly high frequency.
Two possibilities:
- the test is implemented badly;
- it is the wrong test for this kind of data.
I suspect a little of both is at work here.
I'm far from an expert on stats, but I don't believe that Chi2 is the right test for the kind of samples this produces;
and I cannot see any reference to Yates correction in the module.
More later.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
Suck that fhit
| [reply] [d/l] [select] |
|
|
I'm far from an expert on stats, but I don't believe that Chi2 is the right test for the kind of samples this produces; and I cannot see any reference to Yates correction in the module.
Me neither, far from expert. Beyond chi-squared there are tests for finding a pattern in the shuffled data (I once tried to zip a file and count its compression ration achieved as proportional to how random the sequence is), and monte-carlo-splitting the shuffled array in groups and trying to see if the group's average approaches the total array's average.
I will report back chi-squared results using R
That despite the use of a completely bogus rand() function, a Fisher-Yates shuffle would still operate; and produce results:
That all possible shuffles of the data were being produced. I chose to shuffle 4 values because the 24 possible results fit on a screen and are simple to verify manually.
That they were produced with (approximately) the same frequency. Ie. The number of times each possible shuffle was produced were approximately equal and approximately 1/24th of the total runs.
In that respect, it served its purpose.
But, if you are going to formally test a shuffle, using only 4 value arrays and 1e6 iterations probably isn't the ideal scenario to test.
ok | [reply] |
|
|
sub chisquaredVal {
my @data = @_;
@data = @{$data[0]} if(@data == 1 and ref($data[0]));
return "There's no data!" if(!@data);
return "Not enough data!" if(@data == 1);
# return "Malformed data!" if(grep { /\D/ } @data);
my $degrees_of_freedom = scalar(@data) - 1;
my ($chisquare, $num_samples, $expected, $i) = (0, 0, 0, 0);
if (! ref($chitable[$degrees_of_freedom])) {
return "I can't handle ".scalar(@data)." choices without a bet
+ter table.";
}
foreach (@data) { $num_samples += $_ }
$expected = $num_samples / scalar(@data);
# return "There's no data!" unless $expected;
foreach (@data) {
$chisquare += (($_ - $expected) ** 2) / $expected;
}
return $chisquare;
}
I also had to disable the "Malformed data" tests, as they reject floating point numbers!)
Then I wrote a script that repeated the test a hundred times on the F-Y shuffle using MT rand(), and gathered the numerical values it produces into an array: #! perl -slw
use strict;
use Statistics::ChiSquare qw[ chisquare chisquaredVal ];
use Math::Random::MT;
use Data::Dump qw[ pp ];
my $mt = Math::Random::MT->new();
our $N //= 1e6;
our $ASIZE //= 4;
our $T //= 4;
sub shuffle {
$a = $_ + $mt->rand( @_ - $_ ),
$b = $_[$_],
$_[$_] = $_[$a],
$_[$a] = $b
for 0 .. $#_;
return @_;
}
my @data = ( 1 .. $ASIZE );
my @chi;
for( 1 .. $T ) {
my %tests;
++$tests{ join '', shuffle( @data ) } for 1 .. $N;
push @chi, chisquaredVal( values %tests );
}
And finally, ran the original chisquare() function on its own output for assessment: print chisquare( @chi );
__END__
C:\test>chiSquareChiSquare -ASIZE=7 -N=2e2 -T=1e2
There's a >99.5% chance, and a <100% chance, that this data is random.
C:\test>chiSquareChiSquare -ASIZE=7 -N=2e2 -T=1e2
There's a >90% chance, and a <95% chance, that this data is random.
C:\test>chiSquareChiSquare -ASIZE=7 -N=2e2 -T=1e2
There's a >99.5% chance, and a <100% chance, that this data is random.
Finally, I think it got something right!
Its own output is truly random :)
(Sorry. I was bored :) )
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
Suck that fhit
| [reply] [d/l] [select] |
|
|
|
|
| |
|
I will report back chi-squared results using R
It will be interesting to see the results from a known good source.
Because I think that S::CS is (fatally) flawed. To get some feel for the accuracy of the test it performs, I decided to run it on the shuffle using the known good MT PRNG and a small dataset (1..4) a good number of times to see how consistent the results S::CS were; and the answer is not just "not very", but actually just "not":
#! perl -slw
use strict;
use Statistics::ChiSquare qw[ chisquare ];
use Math::Random::MT;
use Data::Dump qw[ pp ];
my $mt = Math::Random::MT->new();
our $N //= 1e6;
our $ASIZE //= 4;
our $T //= 4;
sub shuffle {
$a = $_ + $mt->rand( @_ - $_ ),
$b = $_[$_],
$_[$_] = $_[$a],
$_[$a] = $b
for 0 .. $#_;
return @_;
}
my @data = ( 1 .. $ASIZE );
my @chi;
for( 1 .. $T ) {
my %tests;
++$tests{ join '', shuffle( @data ) } for 1 .. $N;
print chisquare( values %tests );
}
__END__
C:\test>chiSquareChiSquare -ASIZE=4 -N=1e4 -T=100
There's a >25% chance, and a <50% chance, that this data is random.
There's a >50% chance, and a <75% chance, that this data is random.
There's a >10% chance, and a <25% chance, that this data is random.
There's a >10% chance, and a <25% chance, that this data is random.
There's a >50% chance, and a <75% chance, that this data is random.
There's a >10% chance, and a <25% chance, that this data is random.
There's a >25% chance, and a <50% chance, that this data is random.
There's a >75% chance, and a <90% chance, that this data is random.
There's a >25% chance, and a <50% chance, that this data is random.
There's a >50% chance, and a <75% chance, that this data is random.
There's a >5% chance, and a <10% chance, that this data is random.
There's a >75% chance, and a <90% chance, that this data is random.
There's a >10% chance, and a <25% chance, that this data is random.
There's a >50% chance, and a <75% chance, that this data is random.
There's a >75% chance, and a <90% chance, that this data is random.
There's a >75% chance, and a <90% chance, that this data is random.
There's a >50% chance, and a <75% chance, that this data is random.
There's a >50% chance, and a <75% chance, that this data is random.
There's a >5% chance, and a <10% chance, that this data is random.
There's a >95% chance, and a <99% chance, that this data is random.
There's a >1% chance, and a <5% chance, that this data is random.
79 more utterly inconsistent results:
Given this is a known good algorithm using a known good PRNG, all in all, and as I said earlier, I think that is as good a definition of random as I've seen a module produce as its results.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
Suck that fhit
| [reply] [d/l] [select] |
Re^4: Shuffling CODONS
by BrowserUk (Patriarch) on Jun 10, 2018 at 07:39 UTC
|
I am not convinced. A statistical test for assessing statistical fairness is chi-square:
Well yes. Chi2 is the "standard test" in certain circles; but it needs
- to be implemented properly;
- to include Yates' (of Fisher-Yates fame) correction;
- to be used correctly;
I haven't investigated the particular implementation you used, but I did install it so I could run your code. A few times...
C:\test>junk123
best_shuffle : There's a >25% chance, and a <50% chance, that this dat
+a is random.
good_shuffle : There's a >1% chance, and a <5% chance, that this data
+is random.
bad_shuffle : There's a <1% chance that this data is random.
C:\test>junk123
best_shuffle : There's a >75% chance, and a <90% chance, that this dat
+a is random.
good_shuffle : There's a >25% chance, and a <50% chance, that this dat
+a is random.
bad_shuffle : There's a >50% chance, and a <75% chance, that this data
+ is random.
C:\test>junk123
best_shuffle : There's a >10% chance, and a <25% chance, that this dat
+a is random.
good_shuffle : There's a >5% chance, and a <10% chance, that this data
+ is random.
bad_shuffle : There's a >50% chance, and a <75% chance, that this data
+ is random.
In 3 runs, it evaluates the probability that the results from (one of) the best PRNGs available, is actually random, varies between 10% and 90%, whilst that of a deliberately buggered PRNG is between 1% and 75%.
Honestly, those 3 sets of results look as good an approximation of random as I can think of.
I'll investigate whether it is the way you are using it; the implementation, or the potted interpretations it is producing later, once I've woken properly; but at first glance, I'd have to say "somethings not right".
(May be we could feed S::CS its own results and see it it judges itself random :) )
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
Suck that fhit
| [reply] [d/l] |
|
|