Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Assessing a statistical argument on the fraudulance of the Iranian elections

by whakka (Hermit)
on Jun 23, 2009 at 02:32 UTC ( #773831=CUFP: print w/replies, xml ) Need Help??

Update: Fixed to address the actual test used, which I originally misread, from which what they calculate is numerically accurate.

A lot of attention has been garnered by the Washington Post article with a statistical argument that the elections in Iran were a fraud. I replicate part of it below with my critique of what it all means.

Their argument goes like this: A random draw from the digits 0-9 yield a 10% probability of picking any single digit. In the election results the digit 5 occurred as the last digit 4% of the time while the digit 7 similarly occurred 17% of the time. (Apparently this also had some psychological significance.)

"Fewer than four in a hundred non-fraudulent elections would produce such numbers."

A testable assertion! Onto the Perl: (note that the election results had 116 observations)

#!/usr/bin/perl use strict; use warnings; use Statistics::Descriptive; # 1. Simulate 10,000 draws of 116 obs from a random distribution betwe +en 0 and 9. # 2. Calculate: # - the odds one digit occurs 5 or less times (4% of 116) # - the odds one digit occurs 20 or more times (17% of 116) # - the mean and sd -> test 5 and 20 are outside the 95% CI # - the odds both occur my $RUNS = 10_000; my ($FIVES,$TWENTIES,$BOTH) = (0,0,0); my @SAMPLE; my $stat = Statistics::Descriptive::Full->new(); # Collect for my $i ( 1..$RUNS ) { my %h; $h{int(rand(10))}++ for (1..116); my ($old5,$old20) = ($FIVES,$TWENTIES); for ( values %h ) { # $stat->add_data($_); push @SAMPLE, $_; $FIVES++ if $_ <= 5; $TWENTIES++ if $_ >= 20; } $BOTH++ if $old5!=$FIVES and $old20!=$TWENTIES; } $stat->add_data(@SAMPLE); # Analyze printf "Mean:\t\t\t%.2f\nSD:\t\t\t%.3f\n",$stat->mean,$stat->standard_ +deviation; printf "Odds of 5 or less:\t%.3f\n",$FIVES/$RUNS; printf "Odds of 20 or higher:\t%.3f\n",$TWENTIES/$RUNS; printf "95 percent CI:\t\t%.3f --- %.3f\n", $stat->mean - 2.96 * $stat->standard_deviation, $stat->mean + 2.96 * $stat->standard_deviation; printf "Odds of both:\t\t%.3f\n",$BOTH/$RUNS;

Typical Output:

Mean: 11.60 SD: 3.231 Odds of 5 or less: 0.204 Odds of 20 or higher: 0.112 95 percent CI: 2.037 --- 21.163 Odds of both: 0.037
So, from a uniform distribution between 0-9 of 116 random draws you would expect to find one digit occurring 4% of the time or fewer in over 20% of the cases. The odds of a digit occurring 17% of the time or higher is half as frequent yet still comfortably inside the 95% confidence interval. We fail to reject the null hypothesis of both individual tests at the 5% level, therefore "disproving" the "proof" but the odds of both happening simultaneously are 3.7%, which rejects the null hypothesis of a random draw in a 95% confidence interval. Throwing in their last test of adjacent numbers (not coded) moves the frequency to 0.5%.

The fact remains they used arbitrary tests to arrive at this number - you would have to believe each psychological justification to say it bears any significance. It also reeks of data mining - they omit to tell us if they tested other bits of psychological trivia that happened to turn out non-significant. If they did then their final likelihood assessment - 1 in 200 - is invalid, and they should have instead pooled all of their tests, significant or not.

Election fraud is a serious charge and one that should be made with stronger evidence than a few minor statistical anomalies based on flimsy ad-hoc reasoning. Analyses based on exit polling data, for example, are much more sound - if systematic anomalies are observed you either have to reject the polling methodology (sample bias, eg) or question the election results.

Replies are listed 'Best First'.
Re: "Disproving" a statistical argument on the Iranian elections
by wfsp (Abbot) on Jun 23, 2009 at 03:31 UTC
    Reminds me of this: 1 the most popular number!.
    ...the first digit of all numbers is a 1 about 30% of the time, whereas it is 9 just 4% of time.
    "Random" is a tricky concept. :-)
Re: Assessing a statistical argument on the fraudulance of the Iranian elections
by tilly (Archbishop) on Jun 23, 2009 at 18:52 UTC
    Their "arbitrary tests" are based on well-known psychological trends. For example if asked to pick a random number from 1 to 20, 17 and 7 are far and away the most popular choices. It is also well-known that people have a tendency to avoid numbers with recognizable visual patterns. Hence the avoidance of 00, 11, 22, 33, 44, 55, 66, 77, 88, and 99 is also grounded on known psychological grounds.

    That said, you're right that election fraud is a serious charge. We'd like to have as much evidence as possible before leveling it. Unfortunately Iran chose to keep people from having evidence that would allow a good judgment either way. For example there were no external election monitors. Going into the election the consensus of what available polling there was said that the election was close. Then they then announced a landslide that was sufficiently absurd that their own population has been engaged in widespread protests.

    We are then left with a situation with imperfect data. While we'd like to set a high standard for declaring the fairness or unfairness of the election, we simply lack sufficient data to do so. However if we lower the bar to look at what the data suggests, all lines of data that we have point to an unfair election. Those lines include their unwillingness to allow external election monitors in, large discrepancies between (admittedly limited) pre-election polling and results, large popular protests, and the statistical arguments that the Washington Post makes. What is interesting about the Washington Post article is that it provides reasonably strong evidence that the method of cheating was assigning numbers rather than something more subtle such as, for instance, stuffing the ballot boxes.

    Let me repeat that and make it clear. Even without the Post's argument, evidence pretty strongly suggested that the election was rigged. Along that line the Post only offers another line of evidence that confirming what we already had reason to believe. But what it offers that no other line of evidence does is evidence about what method was used to rig the election.

      I will admit the psychological findings are interesting and no doubt have merit. I would have preferred to see a discussion of all such findings and test all of them, not just the ones that turned out to be important. My complaint is the ultimate, dwindling "odds" of a non-fraudulent election is bogus if this isn't done. Cherry-picking tests based on their statistical significance in the data is simply dishonest analysis, regardless of how valid the psychological patterns are.

      I should also clarify that I too think it's likely the vote totals are fraudulent based on the evidence you mention, especially disparities in polling data. However, ammunition is given to the other side of the argument if more care in these allegations isn't made.

Re: Assessing a statistical argument on the fraudulance of the Iranian elections
by John M. Dlugosz (Monsignor) on Jun 23, 2009 at 20:38 UTC
    Your finding of 3.7% is consistent with the article's statement of "fewer than four in a hundred".
      I understand - originally I had misread the statement and didn't test the "both" condition. I wanted to replicate it nevertheless.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://773831]
Approved by graff
Front-paged by Arunbear
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (1)
As of 2018-04-23 22:22 GMT
Find Nodes?
    Voting Booth?