Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Collaborative filtertering

by artist (Parson)
on Aug 18, 2004 at 20:33 UTC ( #384099=monkdiscuss: print w/ replies, xml ) Need Help??

Based on the nodes I like, is it possible to have collaborative filtering of perlmonks nodes? So I can say that these are the nodes I liked and system can tell me possible nodes which can help me to increase my knowledge based on like-minded. Making it dynamic would be more pleasing. So I can drop certain low-level nodes as I progress. I think that such a system could be benificial to all.

Comment on Collaborative filtertering
Re: Collaborative filtertering
by phydeauxarff (Priest) on Aug 18, 2004 at 21:27 UTC
    sounds like you are wanting something like an AI::Categorizer plugin for Perlmonks

    while I suppose that is would be technically possible to implement, I suspect that overhead on the database would prevent this from ever being deployed in a workable manner.

    Very intriquing idea though.

      It will be based on liking of the node. So it won't be neccessary to use 'AI::Categorizer' or any such similar tool. Also it can be built externally to prevent load on database.
        I had assumed that you were proposing being able to "mark" a node as something you like....sort of like I can mark shows I like in TIVO and it decides to record suggetions based on my preferences and habits.

        Marking a node that you like would be pretty easy, you could even just use the current voting system as voting up a node assumes you like it (allthough I hear that some folks use votes to gain XP or affect others XP <grin>)

        the challange as I understand it is then parsing new nodes to offer the users suggestions on nodes they might be interested in....this is where I would imagine the most significant amount of overhead to the system.

Re: Collaborative filtertering
by kvale (Monsignor) on Aug 19, 2004 at 00:15 UTC
    I don't know of any natural language understanding system that could read the nodes you look at, extract the semantic content and and then direct you to other semantically similar web pages. That is outside the boundary of current technology.

    But simpler systems are possible and in fact already exist at Perlmonks.

    One method of extracting relevant nodes is to extract keyword distributions of well-liked nodes and compare them with keyword distributions of of other nodes for similarity. The SMART information retrieval system at Cornell uses an inner product metric for the similarity measure, for instance. Perlmonks has a simple version of this: Super search. Just pick keywords of nodes you like and super search for nodes with desired keywords.

    Another method of retrieving relevant nodes is to take an approach used by the semantic web people: create ontologies through the use of meta information added to the nodes. Perlmonks has this too! The meta information comes in the form of categorization. In the code catacombs, Q and A, and tutorial sections, nodes are organized by category and it is very easy to find nodes on a desired subject. Other sources of meta information on Perlmonks are the author of the node, children nodes of that node, Best/Worst nodes of a time period, reputation, etc. Perlmonks is quite rich in meta information.

    There is work by Naftali Tishby's group on the automatic classification of newspaper articles by using an information-theoretic clustering algorithm. The algorithm came up with surprisingly sensible clusters. Many clusters could be identified with a particular subject; others with reporter who wrote the article. It would be fun to apply such a scheme to the Perlmonks universe. Could such a nonparametric algorithm distinguish a meditation from a tutorial? Positive reputation nodes from negative?

    -Mark

Re: Collaborative filtertering
by davido (Archbishop) on Aug 19, 2004 at 02:46 UTC

    Lurking deep within the code of the Monastery there is a keywords feature, only half-implemented. It's pretty much just awaiting further conceptulization, brainstorming, and coding. It's not really functional yet, and nobody seems to really have a good idea of what to do with it.

    But when/if it ever gets completed, it will probably facilitate the type of thing you're suggesting (again, with additional code). The keyword feature allows people to assign keywords to nodes, and those keywords can (in theory) be used to search for nodes of like content. As I said, it's pretty rough right now, and there really isn't any search facility attached to it from what I can tell. But maybe someday it'll come to fruition.


    Dave

Re: Collaborative filtertering
by chanio (Priest) on Aug 19, 2004 at 04:16 UTC
    As a PopFile fan, let me remind you that the Bayessian Filtering that it uses to classify spam and various personal topics of the received emails, work in a very similar way as this PM's system. Instead of punctuating every answer it asks you to assign a category to every received email.

    In a few months the system has learned a lot. It can do very exact guesses of your possible classification. And you don't have to correct much of it, any more.

    I wouldn't imagine what would it learn by having many levels of classification inside of PM. Say, a general level, fed by all the members, and then a personal level for each sepparate member... It doesn't have to classify spam, but your interests.

    If you are curious about it, you should visit their forum for alternative uses that people create from their nice Popfile opensource!

    .{\('v')/}
    _`(___)' __________________________
Re: Collaborative filtertering
by eric256 (Parson) on Aug 21, 2004 at 23:58 UTC

    Late to the discussion but:

    You could use the current voteing system. To find nodes you like, find all the people who plused nodes that you did, (score them based on how many nodes both of you ++), doing this would give you a group of people who liked the same stuff you did. Now take this group and get all the nodes that they +'ed and that you haven't -'ed. Now you have a list of nodes that people whith similar likes, liked. Score and order by score and you have your list of nodes :). Add in a link to newest nodes list and you would have a list of new nodes that you would probably like.


    ___________
    Eric Hodges
      find all the people who plused nodes that you did
      How does one do that? I didn't know there was a way to find out who voted on a node.

        I don't think there is any way currently. It was just and idea for how that could be done with the existing info.


        ___________
        Eric Hodges
      Good Direction:
      Person could have voted 1000's of nodes just because he/she has extra votes remaining for the day or just liked the node in general rather than specific. IMHO, for practical purpose we cannot use the existing voting system. There is no capabilities in the existing system to vote negative after you have vote positive for the node and vice versa.

        Well it wouldn't be perfect, as the voting system now is not. With enough people voting though you would be looking at rough averages much more than the actual individuals. The idea is to spread the factors out enough that its only if most the people who agree with you most the time like it. That way individual ++/-- mean less and less, its the overall trend of you group of "friends" that decide about nodes you might like.


        ___________
        Eric Hodges

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: monkdiscuss [id://384099]
Approved by ysth
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (10)
As of 2014-08-28 00:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (254 votes), past polls