<?xml version="1.0" encoding="windows-1252"?>
<node id="390969" title="Re: (contest) Help analyze PM reputation statistics" created="2004-09-14 16:01:45" updated="2005-08-02 16:45:55">
<type id="11">
note</type>
<author id="137386">
blokhead</author>
<data>
<field name="doctext">
If we consider the node reputation as a random variable, we can perform some interesting analyses. The  [http://en.wikipedia.org/wiki/Entropy_%28information_theory%29|entropy] of the reputation random variable is 5.27, meaning that the theoretical lower bound for storing the information contained in the node reputations is 5.27 bits per post. So when someone says "node reputation isn't worth 2 bits," they're wrong -- it is actually worth at least 5.27 bits. ;)

&lt;code&gt;
use List::Util 'sum';

my $sum     = sum values %rep_stats;
my $entropy = sum map { -($_/$sum) * log($_/$sum) / log(2) }
                  values %rep_stats;

printf "Total entropy: %.05f\n", $entropy;
&lt;/code&gt;

An interesting statistic would be whether the entropy of the reputation random variable is going up or down over time. Then we could say whether node reputation was becoming more or less meaningful.

&lt;!-- Node text goes above. Div tags should contain sig only --&gt;
&lt;div class="pmsig"&gt;&lt;div class="pmsig-137386"&gt;
&lt;p&gt;
blokhead
&lt;/div&gt;&lt;/div&gt;</field>
<field name="root_node">
390930</field>
<field name="parent_node">
390930</field>
</data>
</node>
