Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Keyword Nodelet / Tagging documentation

by castaway (Parson)
on Sep 09, 2005 at 17:51 UTC ( #490679=pmdevtopic: print w/ replies, xml ) Need Help??
castaway has raised the following topic:

Hi folks,

I've just ported the Keyword Search superdoc (which will only be visible for pmdev, currently), over from the test site. For those that can't see it, its very similar to the Perl Monks User Search with a dropdown list of all currently used keywords to select from.

Anyway, the purpose of this discussion, is to outline some FAQ which will suggest how people should go about tagging nodes, such that useful searching is possible. I've been doing some tagging of my own already, but for global use this needs to be documented.

My suggestion for this would be something like:

  • If a node mentions a module or its question can be solved with a module, tag with the module name, eg "IO::Socket".
  • Always use common capitali(s|z)ations/spellings of nouns, eg "Perl6", "Perl/Tk", "SOAP" - (Though I hope the search is/will be case insensitive anyway)
  • .. ?

The doc also needs to mention that keyword deletion can be done by editors^Wjanitors, and a consideration/editor request/msg can be used to accomplish such.

Also I'm wondering if we shouldn't have a "tagging group" who will put official tags on things, so the search can either search "official" tags, and "everything else" (Can be accomplished by adding a column to the keyword table for the user_id, where the id==tagging group for members of that group).

Please add your ideas about the documentation to this discussion, thanks.


Comment on Keyword Nodelet / Tagging documentation
Re: Keyword Nodelet / Tagging documentation
by Tanktalus (Canon) on Sep 09, 2005 at 21:38 UTC

    I'm very curious as to how the tags were done, are being done, will be done. Because when I select a tag and go to the found node, I can't figure out where that tag is supposed to be. ;-)

    I assume a node can have more than one tag. What is the planned interface for this? A select box where one can use the ctrl-click interface to add additional seen-before tags, and a label box for typing arbitrary (new) tags?

    Also, will there be a publically viewable "Keywords: a, b, c" that everyone will (eventually) be able to see for a given node?

    I'm not trying to change the design, just trying to find out what it was/is :-) If these questions change the design, that was by accident not by, um, er, design. :-)


      They're already being done, though somewhat haphazardly. Yes, a node can have multiple Keywords, Keywords cannot contain spaces, is the only current limit I know of.

      (Look for the "Keyword Nodelet")


Re: Keyword Nodelet / Tagging documentation (vote > privilege)
by tye (Cardinal) on Sep 10, 2005 at 04:10 UTC

    My experience with PM leads me to believe that voting has a much better chance of resulting in a useful categorization system than privilege (the tag adders, tag deleters, and considerers) does.

    My bet is that privilege won't result in a very useful tag system and, even if it starts to, the effect won't last.

    I think you need to allow anyone (or nearly so, maybe about level 3) to add keywords and nearly anyone to vote for/against keywords. But tracking votes is O( nodes * users ) and is the biggest part of our database. Keyword votes could be O( nodes * users * words ). But not tracking keyword votes just affords voting abuse... Maybe an alternative of only tracking recent votes and limiting number of votes per day would not balloon out of control but would strongly discourage voting abuse...

    But I don't think you've designed a keyword system that will work yet. And I don't think that will be an easy design. But I think such could be very useful.

    - tye        

      I think you have a good point re the usergroup, it may start off well, but would probably not keep uptodate much or stay that way.

      Voting as such is a nice idea, though as you say it will produce a lot of data. Which reminds me that I added, and then removed again, the "Rating" field, and you've reminded me what that was - we already keep track of how many times a keyword was added to a particular node, which is sort of like voting, only doesn't store who added which keyword, and allows users to add the same keyword multiple times themselves.

      Having this as a level power sounds like a good idea.

      I wasn't really out to *design* a new keyword system, so much as make the existing one more usable. With a search at least *I* can find stuff I've keyworded the same, and with a documentation, theres a slightly increased chance that others will use the same or similar ones.

      If its a concern of the size the table will get, how about attaching the keywords to the nodes themselves somehow? (In the node table, since I'd like to be able to tag everything and anything), although that wouldn't solve the "who tagged it" problem (if there is one).

      I'm not entirely sure I understand why there would be any use for anyone abusing the vote-on-keyword thing at all, the search will/should show rating(relevanace), but can be sorted by many other criteria. Also since keywords can be removed, all the effort would go to waste fairly quickly. Adding keywords to someones nodes should *not* IMO, give XP of any kind, either to the adder or the node owner. Limiting votes will just get us less keywords..

      Does this mean you don't approve of a documentation at all, currently, or just that you'd like a better system in the future. (This solution will mostly solve my itch, at least, even if I'm the only one using it..)


        The abuse I predict is someone tagging every node by merlyn as "bull..." and other rude, abusive, and obscene tags being thrown in because that's what many children are prone to do when given an anonymous way to scribble on the walls.

        Adding a keyword is so trivially easy while finding the offense, considering it, and getting a privileged user to remove it, is severals times more work. So I bet that increased visibility of the keyword system will eventually lead to an annoying amount of abuse.

        Which reminds me that part of thee value of voting is abuser correction, not just abuse correction.

        One idea would be a non-XP point system whereby adding keywords that get downvoted cost you points such that you can't add keywords as frequently...

        I'm not saying your patch shouldn't be applied. But I personally wouldn't spend time implementing a privileged keywording group and would be prepared for the keyword system needing to be disabled until a major overhaul happens.

        - tye        

Re: Keyword Nodelet / Tagging documentation
by castaway (Parson) on Sep 10, 2005 at 14:07 UTC
    theorbtwo just suggested a couple of ideas to enforce the documentation, such as having a list of stop words not allowed as keywords ("and" "or" etc). Also a list of aliases such as "Tk" -> "Perl/Tk"..


      Rather than aliases, you could make it easy to add existing keywords to new nodes and make it require some confirmation (such as from multiple users) for a new keyword to be added.

      - tye        

Re: Keyword Nodelet / Tagging documentation
by planetscape (Canon) on Sep 12, 2005 at 15:08 UTC

    Here is the list I have been formulating while Keywording nodes to which I have replied (which hopefully equates to "nodes which I know something about" ;-)):

    Updated: 2005-09-16


Re: Keyword Nodelet / Tagging documentation
by eric256 (Parson) on Sep 12, 2005 at 16:02 UTC

    One way to limit abuse would be to require a keyword to be added a certain number of times before it becomes visible. Then you do need to remember who adds what keyword so that they can't do it twice, but that should be fairly trivial. I also think some way of either matching entered keywords to existing ones or listing existing keywords that you can add would be good. That would help with consistency in keywording. I would also think that normalizing keywords as they are entered is important. Removing and,a ,with,the, etc. and lowercasing everything would be a good first step.

    Update: It would also be nice to be able to keyword and vote all in one swipe. So if we are going to use keywords more it would be nice if there was just a field below the entry to enter them in. Then I could keyword and vote on an entire thread all at once. BTW Could keywords use up a *vote*? That should certainly limit the amount of damage an abuser could do.

    Eric Hodges

      One other think.. Minor realy. Could we have the keyword nodelet make the keywords links to searching for that keyword? So you could easily click over there and find other nodes with similar keywords?</p

      Eric Hodges
Re: Keyword Nodelet / Tagging documentation
by planetscape (Canon) on Sep 20, 2005 at 19:03 UTC

    Per castaway's request:

    General Guidelines for Picking Keywords

    Since I have been starting with my writeups - nodes whose roots I've authored or to which I've replied - I have tried to take into consideration both the root node and its replies when picking keywords. (As opposed to keywording nodes which have no replies yet.) So...

    • Modules mentioned in the root node or in replies (those that are either the problem or a possible solution, such as "XML::Simple", for example); problems with modules (or installing modules) in a more general sense get tagged with "modules" and/or "installing".

    • "Automation" is what MS calls it when you use one app to drive another, so that's what I'm calling using Win32::OLE to drive an app such as Excel from Perl (since people more familiar with MS stuff than Perl are more likely to type that into a keyword search than the module name); plus I mention the module, Win32::OLE, for the more Perly types. In other words, I try to use keywords that people coming from "outside" the Monastery might try first.

    • If I know of a term(s) that is synonymous, I try to get that in, too... Like "Ngram" and "Markov Chain," or "course," "class," and "training"...

    • If there is a common abbreviation for something mentioned in the offered solutions (such as "LCS" for "Longest Common Substring", or "KWIC" for "KeyWord in Context"), then I include that... especially since "Longest Common Substring" won't fit in the input area for the Keyword Nodelet. ;-)

    • When the question (and/or solution) depends heavily on a particular method, option, or hash key (Text::ExtractWords minwordlen and maxwordlen re: "minlen/minwordlen - maxlen/maxwordlen" and Re: XML::Simple "transforming data" re: "NoAttr"), I try to get those in.


Re: Keyword Nodelet / Tagging documentation
by dimar (Curate) on Jan 14, 2006 at 17:45 UTC

    Hmmm ... this thread looks a little 'mature'... I cannot help but ask, however, why not just use a site like for this kind of functionality?

    I am a big fan of tagging and 'folksonomy' and it seems like a lot of the effort required for 'infrasturcture' is already in place, and the real difficult part is the actual cognitive effort required to link 'human operational terms' into 'perl equivalent terms'.

    The benefit of is also that folksonomies get associated with a specific *user*, which means tag pollution and tag bigotry would have an automatic 'credibility filter'. If you happen to like a given perlmonks user, and someone has maliciously tagged that user's content as *junk*, you can ignore the person who did the malicious tagging and assume they are a low-credibility source.

    UPDATE:This was added late because I thought these points merited consideration, not to downplay the work already done here on perlmonks. Overall I think the functionality sounds great, and I hope it takes off.


Log In?

What's my password?
Create A New User
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2015-04-26 00:19 GMT
Find Nodes?
    Voting Booth?

    Who makes your decisions?

    Results (481 votes), past polls