Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Developing an Expert System/Intelligent Agent for PM?

by Masem (Monsignor)
on Jun 15, 2001 at 17:37 UTC ( [id://88786]=monkdiscuss: print w/replies, xml ) Need Help??

One of the common themes of late in PM Discussion is trying to help get new users to use existing help resources, including FAQs and using Search/Super Search on PM, prior to asking questions that may already be answered in these other sources. Sometimes this works, but there are other times where it's a bit harder particularly when you don't know a particular term or technique that you want to use. For example, one question this morning was combining arrays into a hash; the obvious answer is to use hash slices, but this concept is not obvious and may be easily not known by newer users.

So while thinking about this problem, one thing that popped into my head was something along the lines where an intelligent agent could be developed to help direct people to possible answers for their questions, in the spirit (but not method) of how Microsoft's Office 2000 Help or Ask Jeeves works. That is, the new user only has to ask a question, and the agent determines what words are important, and in the case of the examples above, typically presents any entries that contain at least one of the keywords in the question.

Well, that's a rather simplistic approach; a better system would have some intelligence as to recongize that if the user asks about "merging" and "array", there's a good chance that the user really wants to know about hash slices, though other possible answers could be the desired one. This would require some method of emphasizing the relationships between words, as opposed to the presence of the words themselves.

The expert system would come into play in that as new users as questions and get a list of possible answers, they would indicate which answers are what they were looking for; the system would then increase a hypothetical weight for that relationship leading to that answer, such that the next time the same words are asked, that answer will have a better chance of being first. After a large number of repetitions of this process, the system should be very good at giving the answers that match quite well with what the user is expecting. The system would also be able to encorporate new questions and answers into it's database as they are created.

Obviously, this would be developed offsite (off PM, that is), and would most likely use the questions from Q&A, maybe SOPW, sans any thing in CODE tags, in order to develop the initial entries. The problem that I see that I don't see an easy solution for is how to handle weighting the relationships. Beyond simply storing the data as the various q&a nodes, along with the sets of words that match and their weight, I can't think of a nice AI-like algorithm that can be adapted for this.

Note that I haven't done anything towards this, I'm only putting my thoughts down and seeing 1) if there's an interest and 2) if this is even possible. I think it would be a nice bit of added functionality to add to PM, but if it's unreasonable to add, then forget it.. :-)


Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
  • Comment on Developing an Expert System/Intelligent Agent for PM?

Replies are listed 'Best First'.
(Ovid) Re: Developing an Expert System/Intelligent Agent for PM?
by Ovid (Cardinal) on Jun 15, 2001 at 19:27 UTC

    It's an interesting idea. The main obstacle that I see is trying to infer meaning from posts. If someone happens to mention an array in passing, does that get a hit? Should 'rep' be taken into account? (I think not. I've seen great posts with low reps and vice versa).

    In my view, there are two ways to infer meaning from posts. Have the developers provide the meaning or the Seekers provide the meaning. For the developers to provide the meaning, someone has to take a look at each post and develop a list of keywords for it. Currently, your post is 88786. That's a lot of posts to go through.

    Actually, I don't think it's that daunting of a task, so long as a group of people were splitting up posts to work on (say, getting a list of root nodes and dividing evenly). If standards were laid down, most threads could be skipped. There wouldn't even need to be keywords for many posts (i.e. posts like this one).

    If the keyword idea has merit (and I suspect that it's the only way to keep down the signal to noise ratio), then there are a few issues:

    • We'd need the cooperation of The Everything Development Company.
    • Who volunteers to go through posts? Who trusts the volunteers?
    • Developing standards: no one should 'type in' a keyword (how many ways do you think 'pseudo-hash' would be spelled?). They should be chosen from drop downs. Uh oh! That limits things. Perhaps a text field with an 'other' category?

    That would be a pain. The other (better) way is for the seekers to tell us what the meaning it. The technique would be to develop an algorithm that would try to infer what the questioner asked, but have a 'rate it' poll in each post (so long as the seeker went through the AI routine). Then, the seeker would rate each question on a scale of 1 - 10 as to how well the post addressed the seeker's question. Over time, if seekers actually used the poll, I think that would produce excellent results. Further, it would be much easier to implement.

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      Your latter point is the way I would expect it to be more like. The system should learn meaning as it's used. Assume that I set this up; I would then tell users to help 'train' it (*) by having the system at the dumb state, and having the users ask it any questions, and help it by indicating which questions were the correct one from the meaning of the question. Once trained, it would then be put on line for all users to use, and it would still learn from that. IT's just that to get it from 'dump' to 'reasonably intelligent' would need a lot of help of monks and others.

      (*) I think thinking neural networks for this, but I don't think that's the way to go on the back-end AI.


      Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
Re: Developing an Expert System/Intelligent Agent for PM?
by Masem (Monsignor) on Jun 15, 2001 at 22:31 UTC
    Here's some more thoughts on this idea, at least implementation-wise:

    First, this can be made sufficiently generic that I'd consider developing this as a module rather than specific for PM.

    As for how to do it, there's two possible ways. First is messy: assume that every question has no more than N possible keywords, so that when the question is asked, you extract the N most important ones (importance as determined by someone else). Thus, we can then simply use a N dimensional table, each entry being a weighted list (eg a list of hashes) sorted by importance. When the response satisfies the question, the response gets a bit more weight; when it doesn't, it loses some. While this is 'easy' to do, a list with 1000 keywords (reasonable) , and 3 keywords per question requires 1000^3, or a billion storage bins. Not impossible, but still messy.

    A better way, but a bit more of trickier programming, would be to use a tree structure; each node would contain a list of responses that contain at least that node's and every parent up to the root of that node's keywords. The children would be stored as a sorted list, more details later. The initial tree would be simply one root and child nodes, one for each keyword, and each keyword knowing what messages it was in.

    When a new question is asked, the keywords are extracted into a sorted list from most important to least important. Starting with the children of the root node (in an order, remember), the first keyword is looked for; if found, we go to that child, and start the process again with the next keyword. If the keyword is not found, we go to the next keyword on the list and try again. If we exhaust the list of keywords from the question, then the responses assoicated with the current node are presented to the user (if at the root node, we simply say "nothing found"). Now if we desend to a node with no children, but still have keywords left from the question, we present the responses for that node, and it's parents in order, and then ask the user if the response was helpful or not. If yes, we take the next keyword from the question list, add it as a new node to the current one, and move the response to the new node's list. If no, we take the next keyword from a sorted list from the response that is not in the question keyword list, and do the same. When a new node is created from an existing one, all other responses of the existing one are evaluated as well and moved as appropriate.

    Note that the same keyword can appear many times in the tree, and messages will appear multiple times as well.

    The keyword importance is important here -- it should be inversely related to the number of responses that all nodes of that keyword (and their children are associated with. That is, less-used keywords are more important than oft-asked ones, such that their questions will be answered first. This re-evaluating can be done periodically (once a day, for example).

    New keywords can easily be added by adding that keyword at a very high importance at the top level, with all responses that that keyword is in added to the node's response list. As time progresses, the keyword and responses will be distributed appropriately.

    Adding new responses is a bit trickier. A list of keywords from a new response should be generated, and every branch where a keyword from that list should be followed, placing the message in the lowest possible node.

    Obvious storage would be an issue, but a straight-forward SQL database would do the job nicely. The requirements should not be as great as for the N-dimensional array system, since it's not expected that every keyword will be in a question with every other keyword.

    Note that this is mimicing the "Guess the Animal" game that has persisted from the start of computer programming, where you use a binary tree to distribut knowledge around.

    Most of these are just ideas, and I haven't attempted to put anything to code yet, so I'm just airing them to see if they sound reasonable...


    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain

      Haven't done AI in a couple months, but here are a couple concerns off the top of my head:

      First of all, you're talking tree structure, so now you have a reasonable search space, do you want to prune that search space? Before you start your iterations through the search space for applicable data are there some simple algorithms that will lower the number of calculations that need to take place? Maybe the keyword tree can be pruned after a dozen keywords because that's enough information to find an answer in, say, 95% of the queries. That would stop the algorithm from going off into the deep end with 20+ keywords when it doesn't need to.

      This is just an example prune, but it's my general opinion that an AI shouldn't come back and say "well, i looked for stuff relevant to what you're looking for and found half the universe, so if you'd just care to browse through and tell me what was appropriate". That's one of the main problems with search engines sometimes, too much information.

      The other major concern is how often it gets updated. You suggested once a day. This is fine if the search AI gets used a lot, because we shouldn't waste time re-analyzing everything while other people are searching, but if it only gets used around 20-50 times a day it would probably be better to re-analyze more often so that a query in the morning won't return the same (possibly irrelevent) information from one in the afternoon, which would generally help everyone get information faster.

      The tree structure seems to be the best method off the top of my head, my brain's slightly dead right now from finals or i would actually sit down and try to plan out something else for comparison other than the brute force N-dimensional table approach you already posted. If i get some time later this week i may try, but the tree you suggest seems a good choice for representing the data in manageable means.

      Good idea, seems reasonable, the module would probably be interesting to see whether it gets used on perlmonks or not, but that may just be me :-)

      HTH,
      jynx

Re: Developing an Expert System/Intelligent Agent for PM?
by elwarren (Priest) on Jun 19, 2001 at 01:57 UTC
    Great idea, been thinking about the same thing myself recently, though I don't have the AI/statistical know-how to implement it.

    I think this functionality should be built into the submission logic so that every node that is created has to pass through it. It should go between the "preview" and the "stubmit" stages. Before actually submitting a node the user should be presented with a list of possible solution threads.

    Hello Dave. I read your submission. Have you checked these threads for information regarding ActiveState PPM Packages? I'm sorry if this isn't what you want, please continue your submission if I have not helped you.

    Installing packages on ActiveState Perl.
    Installing packages in windows98.
    ActiveState Perl PPM HELP?!?!


    This could be implemented (simply) by building a keyword list from the submission, running it through the existing search engine, and returning the results on the submit page. Advanced users could have a recommendations flag on their home node to turn it off. I don't know how many hoops submissions jump through currently, but this one would definitly add some overhead to node creation. Recommendations could be limited to initial thread posts and skipped for replies.

    I also think CODE tags should be included, or at least an option in the search. There are many responses that are great answers to questions, but the entire message is a CODE block. Especially if I'm asking a question about usage of a module.

    Bits: It should definitely include SOPW in the search. I know I spend most of my PM time there. I think returning threads instead of nodes would make searching less daunting to users. A multiple choice keyword selection from a list could help the search, nodes with user selected keywords would have a heigher weight than those without. Nodes written by high ranking users should be returned before lower ranked users', and you might as well roll the node ranking in with that too.
Re: Developing an Expert System/Intelligent Agent for PM?
by nysus (Parson) on Jun 16, 2001 at 17:11 UTC
    Not being a computer science of AI expert, I can't really add technical advice on how to approach this problem. But I think the solution we already have in place---an assembly of a bunch of Perl geeks with time to kill---works famously. I'm sure yours would be an interesting project, but based on my other experiences with similar types of search engines, it wouldn't be able to compete with trillions of neurons wired to think Perl.

    $PM = "Perl Monk's";
    $MCF = "Most Clueless Friar Abbot";
    $nysus = $PM . $MCF;

      Well, no, not really. As alluded to by tilly in I think Casey West is right, there's a certain attitude that some of the older monks have (good or bad, that's not the question here), such that many newbish questions get ignored, flamed, or otherwise treated in a bad way. It shouldn't be the job of a volunteer community such as ours to answer questions that can be found easily in the FAQ or other resources (mind you, we should try to teach others how to use these resources as well). If we had to sit there and answer every one of those questions every time it was asked, it wouldn't take long for some to go insane (ask any helpdesk person).

      Mind you, it's yet another tool that we'd have to teach people to learn, but I see it as two benefits: one, it will help PM cut down the FAQish questions, and two, it's a general enough solution such that anyone with a large document database should be able to use it to provide an intelligent agent.


      Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
Re: Developing an Expert System/Intelligent Agent for PM?
by Anonymous Monk on Jun 19, 2001 at 20:01 UTC

    Isn't the base system of purl (the perl IRC bot by Kevin Lenzo, see www.infobot.org) the perfect system for this? You would just need to adjust the input interface accordingly, I think (purl runs as an IRC bot). If I remember correctly, one of the first uses of Purl was to recognize questions for resources, and answer them with a list of URLs. That sounds like the ideal match for questions related to existing nodes.

    There are even existing "factpacks" (see factpacks list) that could be reused (of course, with purl you get a nice factpack for Perl for free (hint)).

      Assuming that the docs for infobot are correct, this isn't quite what I had in mind. Stripping away the parts that have to do with interacting with the bot, what you are mostly left with is a hash lookup (that is, given a term X, the bot returns the value of the hash key X); the expert system is only as smart as what people put into it (the factpacks for example), and doesn't 'learn' from when it returns information incorrectly or correctly.


      Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain

        But infobot works and you can implement a solution based on it very fast. Sure, the "Knowledge Base" is not too intelligent, but the language interface is very well suited to an environment like the CB. Why not start with it and then build a more complex backend?

        My main argument here is that I think the language part is much more difficult than the knowledge part, at least for the "help desk" type of application we have here.

        "the expert system is only as smart as what people put into it (the factpacks for example), and doesn't 'learn' from when it returns information incorrectly or correctly"

        As far as I understand it, the first part of the argument is true for all expert systems, and the second part is mainly about the "elegance" of the feedback mechanism. Infobot learns when people tell it to forget and to learn a new fact instead. This is a simple feedback mechanism, admitted, but given the track record of infobot, it seems to work very well.

        Christian Lemburg
        Brainbench MVP for Perl
        http://www.brainbench.com

Re: Developing an Expert System/Intelligent Agent for PM?
by mattr (Curate) on Jul 08, 2007 at 11:57 UTC
    Hi,

    There are a few other points you may wish to consider.

    1. The site does already include manually structured data you could use:

  • a. cpan module links and other urls
  • b. Thread titles
  • c. Library entries, and Categorized Questions and Answers
  • d. timestamp, author and votes per post

    2. Some preliminary analyses of the site could be useful:

  • a. Identify common noun phrases with NLP tools and create a topic index
  • b. Use social network analysis tools to find relationships between contributors, their XP, their posts and included urls regarding a given topic. Not that XP is that important but it might screen out newbies possibly. This could be used to help weight importance.

    3. Manual tagging of threads or posts by all monks, perhaps weighted by XP (requires modification of PM site).

    4. Most interesting analyses could be done if PM's content (library, threads, XP db) could be saved to an sql dump that interested monks could load locally say in a mysql db, solely for analysis and creation in their own free time of web or gui based apps.

    5. Here is a short thread I initiated 5 years ago here proposing creation of an online PM Knowledge Base using an annotated, modifiable tree structure of topics with some way to consensually tag posts with topics.

    6. It would probably be most interesting for a system that uses cooperation of monks to build it, to host it on the net (on a different server than PM, due to load and security issues). Perhaps if it gets to a milestone it could be featured monthly (state of the knowledge base) on PM with info about what branches need help.
    Also ajax could be used for autocompletion or auto-recommendation of related topics quickly (especially if an apache/fastcgi or modperl system is used) to give users the experience of being able to quickly browse through an interactive encyclopedia. I find the load on PM (I don't think the problem is latency to Tokyo) and perhaps lack of indexing to make searching for things very tiring. We need indexing and visualization of the knowledge in PM and I'd recommend making an online system based on automated analysis/update plus tweaks/annotations by humans, and then you can always access that from a gui or download a snapshot any time if you wish. This requires a periodic dump from PM to your database or ftp server.

    Matt

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: monkdiscuss [id://88786]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (6)
As of 2024-04-23 12:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found