http://www.perlmonks.org?node_id=390969


in reply to (contest) Help analyze PM reputation statistics

If we consider the node reputation as a random variable, we can perform some interesting analyses. The entropy of the reputation random variable is 5.27, meaning that the theoretical lower bound for storing the information contained in the node reputations is 5.27 bits per post. So when someone says "node reputation isn't worth 2 bits," they're wrong -- it is actually worth at least 5.27 bits. ;)
use List::Util 'sum'; my $sum = sum values %rep_stats; my $entropy = sum map { -($_/$sum) * log($_/$sum) / log(2) } values %rep_stats; printf "Total entropy: %.05f\n", $entropy;
An interesting statistic would be whether the entropy of the reputation random variable is going up or down over time. Then we could say whether node reputation was becoming more or less meaningful.

blokhead

Replies are listed 'Best First'.
Re^2: (contest) Help analyze PM reputation statistics
by sintadil (Pilgrim) on Sep 15, 2004 at 00:15 UTC

    Being that $NORM is also a measurement of the trend of node reputation, how is $entropy useful as an ancillary view of the same trend?

      $NORM measures the average reputation of recent nodes. It answers the question, "Are nodes rated high or low?"

      Entropy measures the information content of node reputation. It answers the questions, "How much does the node's reputation tell us? How meaningful is the assignment of reputation?"

      Say $NORM is 11. Well, this can happen if all recent nodes have reputation 11. If this is the case, the entropy is 0 because knowing that a node has reputation 11 tells us nothing about the node.

      On the other hand, maybe among all recent nodes, an equal number of them have reputation 1, 2, 3, .. up to 22. This situation also gives us $NORM = 11. But here, knowing the reputation of a node gives us much more information. Reputation in itself is more meaningful in this scenario because it can tell us something. The something it is telling us is information in the theoretical sense.

      $NORM tells us whether nodes are given high or low reputations on average (although the variance might be useful to know as well). It is an analysis of the values of a random variable. Entropy is completely orthogonal; independent of how high or low the nodes are ranked, it tells us how informative node reputation really is. It is an analysis of the uncertainty of a random variable. You can have any combination of low or high average with low or high entropy.

      blokhead

        Okay, that makes sense now (after several rereadings with my tired brain). Thanks! :)

        On the other hand, maybe among all recent nodes, an equal number of them have reputation 1, 2, 3, .. up to 22. This situation also gives us $NORM = 11.
        I think you mean "up to 21"; as stated, you would get $NORM = 11.5.