Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Renovating Best Nodes

by demerphq (Chancellor)
on Feb 13, 2004 at 00:33 UTC ( #328699=monkdiscuss: print w/ replies, xml ) Need Help??

You may have noticed changes in the best nodes nodelets and pages, Best Nodes and Worst Nodes and their related nodelets all got a facelift. Recent discussions on the subject have lead to some renovation of how they were handled and hopefully reducing server load, along with the disappearance of the "Best Nodes of All Time" and the "All-Time Best" nodelet which more or less been replaced by new catagories. Fear not for Camel Code and friends of the old top ten, they may still pop up occasionally in the all new Selected Best Nodes. This node will show a randomly selected 50 of the top 2000 nodes (by reputation anyway) in the monastery. Its refreshed every 6 hours, and has a pretty good spread of material in it.

As I mentioned part of the objective of the change was to reduce the server load by caching the best node results. Each catagory is independently cached and for differing periods. The time of last update is shown under each, but for convoluted reasons the nodelets are in GMT (the page versions are in the users localtime). Its planned to add the time remaining to refresh, or possibly other information but at the writing of this it hasn't been applied. There are also plans for finer control on the number of rows in the nodelets, and possibly a few other ideas.

Anyway, I hope everybody enjoys. :-)

I'd just like to mention theorbtwo for his efforts in supporting me, but more importantly for advancing pmdev as a whole. Thanks to tye for applying the patches and the other stuff he does around here, and castaway for all the groovey things shes been doing. (All the rest of the PM brethren too :-) I've learned a lot here, and its nice to be able to give something back.

Cheers.


---
demerphq

    First they ignore you, then they laugh at you, then they fight you, then you win.
    -- Gandhi


Comment on Renovating Best Nodes
Download Code
Re: Renovating Best Nodes
by bassplayer (Monsignor) on Feb 13, 2004 at 02:40 UTC
    What a wonderful new pile of reading material. And the gift keeps on giving. No problem finding places to spend votes today. A decrease in server load to boot. demerphq++

    bassplayer

Re: Renovating Best Nodes
by Anonymous Monk on Feb 13, 2004 at 07:10 UTC

    ++ demerphq.

    Excellent implementation of your proposal.

    Just one minor addition would be needed, i.e. a reciprocal link between Selected Best Nodes and Best Nodes, and perhaps a help-like link to this page, so people won't go around wondering what happened to Camel Code & Co.

Re: Renovating Best Nodes
by halley (Prior) on Feb 13, 2004 at 15:27 UTC
    There doesn't appear to be a 'Selected Best' nodelet. I'd suggest 10~12 instead of 50, but I miss the 'all time best' nodelet already. For the links I used often, I'll just add those to my 'Personal Nodelet' instead.

    --
    [ e d @ h a l l e y . c c ]

      Yep. I was planning to wait on that one until the number of nodes could be user controlled.

      Sorry about All-Time Best, but hopefully it will be for the best in the end. :-)


      ---
      demerphq

        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi


        I see two issues: (1) user arguments to control a widget is not well-established here, so you're "waiting" for a lot to be implemented; and (2) wouldn't user-controlled best counts be harder on the server? At minimum, you'd have to calculate the maximum "selected" nodes for a given refresh time, then save multiple versions of nodelet content.

        --
        [ e d @ h a l l e y . c c ]

        I'm pretty sure you'll not get user preferences in the nodelets for quite a while (if ever). Nodelets are rendered on every page load and so are best when they can be cached. And cached nodelets can't be user-specific.

        Your caching for best/worst nodes is quite nice but it still reads quite a few records out of the database. Doing that 4 times for every page load would probably end up with more DB load than we had before your work.

        - tye        

Re: Renovating Best Nodes
by tilly (Archbishop) on Feb 15, 2004 at 14:31 UTC
    I like it, but I think that it would be nice to weight Selected Best Nodes more heavily towards the top few nodes.

    As things stand, most times this is run, we don't have a single node in the top 20. More than a quarter of the time we won't get anything in the top 50.

    Of course getting a broad selection is good as well.

    The following snippet shows how you can balance the two fairly flexibly:

    # I'm assuming that sth returns a long list of nodes ordered # from lowest rep to highest and then newest to oldest. my @selected; for (1..50) { push @selected, $sth->fetchrow_hashref(); } while (my $row = $sth->fetchrow_hashref()) { if (rand(1) < 0.1) { $selected[rand(@selected)] = $row; } }
    The resulting distribution has the following properties (back of the envelope calculation):
    1. Most of the time we have some node in the top 10.
    2. Our odds of not getting something in the top 50 are about .5%.
    3. A nodes chance of getting in is better than the current scheme if it is in the top 692, or out of the top 2000.
    4. About 60% of the time we get a node out of the top 2000 included.
    I think that something like this would do a better job of showcasing the top nodes, while giving even more nodes a chance to be seen.

    UPDATE: Here is a changed code sample that does the same as the above, only it reads from the highest reputation node to the lowest because I've been told that this is better. (A fact that complicates it, but oh well.)

    # I'm assuming that sth returns a long list of nodes ordered # from highest rep to lowest and then oldest to newest. my @selected; my @filler; my $limit = 50; while (my $row = $sth->fetchrow_hashref()) { if (rand(1) < 0.1) { $selected[rand($limit)] ||= $row; } elsif (@filler < $limit) { push @filler, $row; } } for (0..($#filler)) { $selected[$_] ||= $filler[$_]; }
    This does the same thing as the snippet above except that I am filling in "nothing got chosen by chance" with top nodes rather than bottom nodes. If you fetch 4000 nodes, then you will only fill in from the filler 1.68% of the time. Alternately you can change the 0.1 to 0.2, and leave the number of nodes that you fetch at 2000. Or you can fetch 2000, leave the parameter at 0.1, and say that people don't mind seeing an extra one of the top 55 nodes or so get spotlighted 60% of the time.

    Many other ways to tweak this exist.

Re: Renovating Best Nodes
by ysth (Canon) on Feb 15, 2004 at 17:15 UTC
    Very nice work.

    I was curious to know what the reputation cut-off for the top 2000 turned out to be; if others are also, perhaps this could be added to the top.

    I suspect I will only read through parts of selected best when I have a burst of free time. It would be nice if additional pages of 50 were available in case my time exceeds the available material. Would it be hard to put all 2000 into 40 pregenerated pages linked in sequence? That might lead to people scrolling through until they found one they are looking for, though; not a good thing for server performance.

    A few nits: the as of line should have a semicolon instead of a comma and a lowercase n for "next refresh" (or a period and then a capital Next). On selected best nodes, "The days catch" should say "day's".

Re: Renovating Best Nodes
by ysth (Canon) on Feb 16, 2004 at 04:12 UTC

      Heh, yeah, right now actually. :-)


      ---
      demerphq

        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi

        Flux8


Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: monkdiscuss [id://328699]
Approved by blokhead
Front-paged by blokhead
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (16)
As of 2014-07-22 19:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (126 votes), past polls