We don't bite newbies here... much PerlMonks

### Re: (contest) Help analyze PM reputation statistics

 on Sep 14, 2004 at 20:01 UTC ( #390969=note: print w/replies, xml ) Need Help??

If we consider the node reputation as a random variable, we can perform some interesting analyses. The entropy of the reputation random variable is 5.27, meaning that the theoretical lower bound for storing the information contained in the node reputations is 5.27 bits per post. So when someone says "node reputation isn't worth 2 bits," they're wrong -- it is actually worth at least 5.27 bits. ;)
```use List::Util 'sum';

my \$sum     = sum values %rep_stats;
my \$entropy = sum map { -(\$_/\$sum) * log(\$_/\$sum) / log(2) }
values %rep_stats;

printf "Total entropy: %.05f\n", \$entropy;
An interesting statistic would be whether the entropy of the reputation random variable is going up or down over time. Then we could say whether node reputation was becoming more or less meaningful.

Replies are listed 'Best First'.
Re^2: (contest) Help analyze PM reputation statistics
by sintadil (Pilgrim) on Sep 15, 2004 at 00:15 UTC

Being that \$NORM is also a measurement of the trend of node reputation, how is \$entropy useful as an ancillary view of the same trend?

\$NORM measures the average reputation of recent nodes. It answers the question, "Are nodes rated high or low?"

Entropy measures the information content of node reputation. It answers the questions, "How much does the node's reputation tell us? How meaningful is the assignment of reputation?"

Say \$NORM is 11. Well, this can happen if all recent nodes have reputation 11. If this is the case, the entropy is 0 because knowing that a node has reputation 11 tells us nothing about the node.

On the other hand, maybe among all recent nodes, an equal number of them have reputation 1, 2, 3, .. up to 22. This situation also gives us \$NORM = 11. But here, knowing the reputation of a node gives us much more information. Reputation in itself is more meaningful in this scenario because it can tell us something. The something it is telling us is information in the theoretical sense.

\$NORM tells us whether nodes are given high or low reputations on average (although the variance might be useful to know as well). It is an analysis of the values of a random variable. Entropy is completely orthogonal; independent of how high or low the nodes are ranked, it tells us how informative node reputation really is. It is an analysis of the uncertainty of a random variable. You can have any combination of low or high average with low or high entropy.

Okay, that makes sense now (after several rereadings with my tired brain). Thanks! :)

On the other hand, maybe among all recent nodes, an equal number of them have reputation 1, 2, 3, .. up to 22. This situation also gives us \$NORM = 11.
I think you mean "up to 21"; as stated, you would get \$NORM = 11.5.

Create A New User
Node Status?
node history
Node Type: note [id://390969]
help
Chatterbox?
 [Corion]: Hmmm. I feel a Meditation coming on. I wrote a module, DBIx::PivotQuery, which returns a table-like set of rows (AoA) but some columns are generated from column values, like in an (Excel) pivot table or a ROLLUP command [Corion]: My current approach for subtotals involves rerunning the given query, with the hint to the user that they should use a temporary table if they want better performance. [Corion]: But I could create that temporary table in the module and use it for the improved perfomance directly instead. [Corion]: And the question is, what would be better/preferred ;-) [Corion]: Hmm - not exactly like the ROLLUP command. Ah well.

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (10)
As of 2017-02-23 15:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Before electricity was invented, what was the Electric Eel called?

Results (347 votes). Check out past polls.