Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
Syntactic Confectionery Delight
 
PerlMonks  

Re^2: More PM stats analysis on new levels

by xdg (Monsignor)
on Dec 05, 2005 at 12:52 UTC ( #514112=note: print w/ replies, xml ) Need Help??


in reply to Re: More PM stats analysis on new levels
in thread More PM stats analysis on new levels

What do you mean "zombie" initiates? Inactive?

Another question -- is jcwren getting a direct dump/feed from the database or pulling via the XML feeds? I was considering using the XML feeds to pull down a summary of all the nodes so I could examine reputation, not just XP. For example, reputation from initial posts vs from replies, or in different categories, or the ratio of total reputation to total XP.

Is that possible -- is reputation available for nodes other than my own? My quick scan of the XML generators didn't reveal it.

-xdg

Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.


Comment on Re^2: More PM stats analysis on new levels
Re^3: More PM stats analysis on new levels
by demerphq (Chancellor) on Dec 05, 2005 at 13:05 UTC

    What do you mean "zombie" initiates? Inactive?

    Sorry, I should have been more clear. Zombies are users that never posted, never voted, and never really used their accounts.

    I think we could look into providing you a batch of more specifc data. Id have to think a bit on how to present the info so that it doesn't tell you each nodes rep exactly, but does allow you to do your stats. If you can suggest forms of the info that would be sufficiently useful to you but sufficiently anonymous that I can give them to you Id be happy to do so.

    ---
    $world=~s/war/peace/g

      The data set I'd love to get is the number of nodes and sum of node reputations for initial posts and replies in each category of Perlmonks. If I had that by user, plus user XP and maybe even date user joined, that would be a fantastic data set.

      The reason that "by user" helps is that it easily allows clearing out outliers like the nodereaper and zombies. For anonymity, the data set doesn't even need to have user name/home-node id -- though that doesn't really protect the anonymity of the Saints in our book. If by user (even masked) isn't sufficiently anonymous, then those same stats summarized by monk level would be sufficient, as long as vroom/antivroom/nodereaper/zombie accounts were stripped out first.

      Does that address the anonymity concern?

      -xdg

      Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

        The only bit I dont get is what you mean by category. Do you mean nodetype?

        ---
        $world=~s/war/peace/g

      Id have to think a bit on how to present the info so that it doesn't tell you each nodes rep exactly, but does allow you to do your stats.
      How about adding random noise to the XP of each post? Use some rather large uniform distribution (say +/-100?), but don't report the size of the distribution. As long as the mean remains relatively unchanged, the stats should too. Or choose a different distribution. This would suffer from rough guesses about the size of the distribution based on the largest negative value, and some of the lowest scoring nodes could be guessed.

      Another idea is to take nodes in pairs at random, and shuffle their XP up a little. If two nodes have 17 and 48 XP, change them randomly by +/-5, so that the sum is still the same.

      Do this randomly across many pairs (not necessarily all), such that most nodes have changed only slightly. Then each slice of the XP distribution should be stable, and guessing XP is much harder for low scoring nodes.

      If xdg is going to use post order, or distinguish between different "grades" of XP, then the distribution must be chosen more carefully. After all, a Max or Min XP stat would be meaningless, and a plot of XP by post order, or XP by calendar date might be bogus.

      Update: You can only give this out a few times. After the 5th or 10th set, a node's average XP tends to settle down. Unless you can come up with wildly differing distributions every time.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

        How about adding random noise to the XP of each post

        Careful with terminology here. Users have XP. Posts have reputation.

        I really wouldn't need per-post reputation for what I was thinking of doing if I can get the aggregate statistics I mentioned.

        -xdg

        Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re^3: More PM stats analysis on new levels (large query result)
by demerphq (Chancellor) on Jan 14, 2006 at 17:42 UTC

    Well I put together the following query for you. I don't think its exactly what you had in mind, but its more than nothing. Its a breakdown of posts by type by level of author. Of course its by level of author _now_, not when originally posted. It does not include reaped nodes.

    And this is the breakdown of the notes by the type of the root node of the thread.

    ---
    $world=~s/war/peace/g

Re^3: More PM stats analysis on new levels (large query result)
by demerphq (Chancellor) on Jan 14, 2006 at 19:04 UTC

    I also put this one together for you. Its a breakdown of posts by type, level of poster and (bucketized) node reputation.


    select t.title typetitle, lb.level, CEIL(n.reputation/10)*10 noderep, count(n.node_id) nodecount
    from node n, node a, user u, node t, level_buckets lb
    where n.author_user = a.node_id
    and   n.type_nodetype = t.node_id
    and   a.node_id = u.user_id
    and   CEIL(u.experience/10)*10 = lb.experience
    and n.author_user != 52855
    and n.type_nodetype in (31670, 1042, 31663, 1036, 11, 935, 1588, 173295, 121, 120, 23614, 23615, 115,
    956, 389544, 1584, 337433, 1440, 7487, 7488, 1980, 1981, 1748, 1749)
    group by t.title, lb.level, noderep
    order by t.title, lb.level, noderep

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://514112]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (16)
As of 2014-04-17 20:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (455 votes), past polls