Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Matching a question in text

by voyager (Friar)
on Jun 26, 2001 at 01:10 UTC ( [id://91440]=note: print w/replies, xml ) Need Help??


in reply to Matching a question in text

Take the text of the email and throw way noise words (what, how, the, etc.). Then take the words left and see if they appear in the faqs. Keep track of how many words in the email match a particular FAQ so you can rank the FAQs.

So when the user clicks "SEND", you can politely say something like "Here is a list of FAQs that might answer your question". Each FAQ has a link to the FAQ and a blurb so user can tell with out going to the FAQ whether it might help.

You should have a button/link in a very obvious place that says, in effect, "None of the FAQs answer my questions, Submit the question Now".

Over time you can tune your list of noise words and perhaps even recognize a list of words that should be given more relevance in sorting the FAQs for display.

Replies are listed 'Best First'.
Re: Re: Matching a question in text
by voyager (Friar) on Jun 26, 2001 at 04:05 UTC
    Noise words, important words, etc. tend to be domain-specific. What I do for my current project is for every search, I log:
    • what was typed (e.g. "show me all the foo and bar")
    • what i searched on (e.g., "foo bar") # we use Lingua::Stem and other tricks
    • how many "hits"
    This is written to a log file and a cron job dumps results into mysql db for easy reporting.

    So to finally answer your question, you determine noise words by looking at what your users do. HTH

Re: Re: Matching a question in text
by swiftone (Curate) on Jun 26, 2001 at 01:12 UTC
    Hmm. Interesting. Any ideas for a good source of "noise" words, or do I just fake it?
      Search engines do this.
      It was either htdig or swift-e that had a file that contained such "noise words". Just use that. (I think swift-e had them in it's source code).

      You can find links to them here:
      http://www.searchtools.com/

      On another note, the source is available for alot of the search engines on the page. Code examples for things like fuzzy search and context searching might be available.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://91440]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-04-26 09:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found