Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Natural language processing

by Misha (Acolyte)
on Sep 01, 2007 at 13:35 UTC ( [id://636511]=perlquestion: print w/replies, xml ) Need Help??

Misha has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I have a text which is a free conversation between a seller and a buyer. What I need to do is to analyze the conversation according to keywords and to understand if the deal was closed or not. I am new in the natural language processing and I will be glad to accept any guidance on this issue. What I am looking for is Perl modules that could be useful or Perl wrapper for existing text analyzer. Thanks M

Replies are listed 'Best First'.
Re: Natural language processing
by mmmmtmmmm (Monk) on Sep 01, 2007 at 14:38 UTC
    Misha --

    Very interesting that you are looking for this. I am currently in the process of writing a set of tutorials on NLP with Perl, as I learn more about the subject myself. I have been working on it for a few days, and the first of the series is starting to reach a finished state. I'll send you a /msg when I complete it...

    For now though, you might want to look up information about WordNet and the Link Grammar System on Google. There are Perl modules on CPAN for both of them. For WordNet, check out WordNet::QueryData, Lingua::Wordnet, WordNet::Similarity, and Lingua::Wordnet::Analysis. For the Link Grammar Parser, check out Lingua::LinkParser. There are many other modules - just browse around on CPAN in the Lingua:: group....

    The field of natural language processing is huge, and it is difficult to find documentation out there that is accessible to the non-professional (one of my primary reasons for writing a tutorial). But the two tools I mentioned above are fairly easy to use, and if you can manage to get through the documentation, you should know enough at the end to know where to direct yourself from there. Browse around on Wikipedia as well to introduce yourself to some of the terminology used in the field. This will make reading the documentation much easier.

    As far as the program you want to write -- training a computer program to pull useful information out of free conversation is not a simple thing to do. You'll see that after a few minutes of reading the Introduction To The Link Grammar Parser ;)

    Here's a list of some of the interesting reading I've found lately. Hope this helps, and please let me know if you find anything interesting:



    --mmmmtmmmm
      You may also want to look at some of the Prolog modules.
      AI::Prolog seems to offer all the advantages of the procedural programming language from Prolog and the flexibility of Perl.
      Constructing the various rules to cover all events which will be used to determine that a “deal has been done” will be difficult.

      In conversations the same sentences are not normally used again and again nor words in the same order, this will play havoc with predicates (rules and structures, new rules will need to be constantly added to cover changing speech patterns and vocabulary, therefore getting anything like 100% accuracy is not likely.

      But maybe that sort of accuracy is not required.

      Perhaps you could post a sample conversation so we can get an idea of what is involved.

        A couple of quick notes. First, AI::Prolog is useful when there's simple pattern matching, but it's very slow. If you have a large search space, it's unusable. Also, I've always wanted to add support for DCDs (declarative clause grammars) which would make natural language processing much easier.

        Cheers,
        Ovid

        New address of my CGI Course.

Re: Natural language processing
by dmorgo (Pilgrim) on Sep 02, 2007 at 01:46 UTC
    This will be hard, as in hard AI.

    It reminds me of a data mining problem I heard about some years ago. The problem was with letters sent to the offices of politicians. It was easy to tell the topic of a letter -- say, abortion, or the Iraq war -- but it was hard to tell which side of the issue the writer was on. Even key phrases can be misleading, because they can be used by the writer as examples of wrong thinking, or as quotes that are refuted.

    If your goal is just to get a guess with a probability one way or another according to some metric, that would be doable, but you will not have a very high accuracy rate.

    Why not just have check boxes the users check to confirm that the deal is done? Maybe you're parsing data from a system that doesn't belong to you?

    BTW, I'm curious, what is the existing text analyzer you refer to?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://636511]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2024-03-28 18:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found