Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^3: More PM stats analysis on new levels

by demerphq (Chancellor)
on Dec 05, 2005 at 13:05 UTC ( #514114=note: print w/ replies, xml ) Need Help??


in reply to Re^2: More PM stats analysis on new levels
in thread More PM stats analysis on new levels

What do you mean "zombie" initiates? Inactive?

Sorry, I should have been more clear. Zombies are users that never posted, never voted, and never really used their accounts.

I think we could look into providing you a batch of more specifc data. Id have to think a bit on how to present the info so that it doesn't tell you each nodes rep exactly, but does allow you to do your stats. If you can suggest forms of the info that would be sufficiently useful to you but sufficiently anonymous that I can give them to you Id be happy to do so.

---
$world=~s/war/peace/g


Comment on Re^3: More PM stats analysis on new levels
Re^4: More PM stats analysis on new levels
by xdg (Monsignor) on Dec 05, 2005 at 14:05 UTC

    The data set I'd love to get is the number of nodes and sum of node reputations for initial posts and replies in each category of Perlmonks. If I had that by user, plus user XP and maybe even date user joined, that would be a fantastic data set.

    The reason that "by user" helps is that it easily allows clearing out outliers like the nodereaper and zombies. For anonymity, the data set doesn't even need to have user name/home-node id -- though that doesn't really protect the anonymity of the Saints in our book. If by user (even masked) isn't sufficiently anonymous, then those same stats summarized by monk level would be sufficient, as long as vroom/antivroom/nodereaper/zombie accounts were stripped out first.

    Does that address the anonymity concern?

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

      The only bit I dont get is what you mean by category. Do you mean nodetype?

      ---
      $world=~s/war/peace/g

        Yes. I was curious about whether more reputation accumulates in SOPW replies versus Meditations original posts and things of that sort for the various different levels.

        -xdg

        Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re^4: More PM stats analysis on new levels
by QM (Vicar) on Dec 05, 2005 at 14:45 UTC
    Id have to think a bit on how to present the info so that it doesn't tell you each nodes rep exactly, but does allow you to do your stats.
    How about adding random noise to the XP of each post? Use some rather large uniform distribution (say +/-100?), but don't report the size of the distribution. As long as the mean remains relatively unchanged, the stats should too. Or choose a different distribution. This would suffer from rough guesses about the size of the distribution based on the largest negative value, and some of the lowest scoring nodes could be guessed.

    Another idea is to take nodes in pairs at random, and shuffle their XP up a little. If two nodes have 17 and 48 XP, change them randomly by +/-5, so that the sum is still the same.

    Do this randomly across many pairs (not necessarily all), such that most nodes have changed only slightly. Then each slice of the XP distribution should be stable, and guessing XP is much harder for low scoring nodes.

    If xdg is going to use post order, or distinguish between different "grades" of XP, then the distribution must be chosen more carefully. After all, a Max or Min XP stat would be meaningless, and a plot of XP by post order, or XP by calendar date might be bogus.

    Update: You can only give this out a few times. After the 5th or 10th set, a node's average XP tends to settle down. Unless you can come up with wildly differing distributions every time.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      How about adding random noise to the XP of each post

      Careful with terminology here. Users have XP. Posts have reputation.

      I really wouldn't need per-post reputation for what I was thinking of doing if I can get the aggregate statistics I mentioned.

      -xdg

      Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://514114]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2014-09-01 21:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (17 votes), past polls