http://www.perlmonks.org?node_id=353891


in reply to Re: The (futile?) quest for an automatic paraphrase engine
in thread The (futile?) quest for an automatic paraphrase engine

There seem to be some fairly nice results from statistical methods. I have a couple of references in my post in this thread, but what it boils down to is that there is the knowledge-based way (yours and my preference, apparently) and at least one statistical method being used. The statistical NLP method is called "clustering" because it creates clusters of semantically relevant sentence constituents and then re-uses those constituents to generate a summarization text.

--
Damon Allen Davison
http://www.allolex.net

  • Comment on Re: Re: The (futile?) quest for an automatic paraphrase engine

Replies are listed 'Best First'.
Re: Re: Re: The (futile?) quest for an automatic paraphrase engine
by tachyon (Chancellor) on May 17, 2004 at 11:49 UTC

    Stats are somewhat like doing spam with Bayes, Fisher/Robinson etc. For certain tasks they can make useful 'educated' guesses but they are still 'dumb' algorithms. If you look at how a child learns language they do seem to use a suck it and see approach. They then get feedback on if that was a 'winner' or not. As approaches to AI go I think both knowledge and stats based are 'wrong'. While there is no doubt that both can yield useful results they appear to my mind to have finite limits. I favour a fuzzy logic nodal learning framework ie try to build a machine that can learn without trying to tell it exactly how to learn that. The main issue with this is processor speed (or rather the lack of it) combined with the training time. Language processing is actually a good task for this as you have what is effectively a character based input and output stream making the interface simple.

    cheers

    tachyon

      Humans learn language as a social tool within a social environment. I won't exactly say never, but it will be a long time before computers are able to learn language the way humans do. Some people argue that it's possible to give an AI a corpus and have it learn from that, but the conditions are still not the same because the AI is still not participating to learn, just observing. Children learn language by forming intermediate (defective) grammars, which are then corrected by others in their environment, usually adults--their parents. A computer program is never going to have that kind of exposure unless we get humans to correct them, which comes back to a knowledge-based approach.

      Wittgenstein pointed out that people learn not by being told what things are, but by being exposed to examples. (People are always talking about "food" and "the fridge" in the same context, so maybe there's a relation...) Given this and that language is so dependent to do with the way humans are built and live (How do we learn what "mother" or "cousin" is?) that we'd pretty much have to emulate a human before teaching our emulation how to speak in this way. So knowledge is still important to give our linguistic AIs a field of reference that it would otherwise just not have.

      I guess the point I am trying to make is that stats just produce results, but don't really reflect anything more than data regularities in a given context. Knowledge has its major fault in its static nature. And fuzzy logic is nice, but, at least for this application, it needs some knowledge to start with. A hybrid approach using all three might be possible by giving a knowledge-driven AI the capability of creating it's own knowledge using statistical snapshots. Who knows?

      --
      Damon Allen Davison
      http://www.allolex.net

        it will be a long time before computers are able to learn language the way humans do.

        Corollary: it will be a long time before programmers are able to solve the stated problem with an acceptable level of accuracy.


        ;$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$;[-1]->();print
        The *first* computer to learn a language will need formal training. It can then be the "adult" as other computers engage in a more participatory style of learning. It would be fun to see how the "human" English and the "computer" English (or some other "evolved" human language) diverge as generation after generation of computers are taught from the preceding generation. Will the computers develop their own slang?