Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: The (futile?) quest for an automatic paraphrase engine

by BrowserUk (Patriarch)
on May 17, 2004 at 05:48 UTC ( [id://353874]=note: print w/replies, xml ) Need Help??


in reply to The (futile?) quest for an automatic paraphrase engine

On a more serious note, the objective of the exercise seems to be to reduce the manual work involved in deriving questions that can be answered from having read a piece of text? Assuming that is the goal, then I think that this may be quite doable, using perl, but it would require a different tack to that you have outlined.

Rather than trying to break the body of the text up into discrete chunks and then recombine them into possible answers, which a human being can then subset appropriately before deriving a set of questions, turn the process around.

That is to say. Have the human being construct sets of questions from bodies of example text. Then write a program that takes the sets of questions and the bodies of text and attempts to derive patterns which relate the questions to sequences & relationships of words within the bodies of text.

It would require a good number bodies of text and sets of questions to 'train' the program, and some reasonable mechanism to allow a human being to correct and refine the patterns matched over time.

Approaching the problem this way around means that the program does not have to perform any semantic analysis of either the text or the derived questions. It only needs to discover, extract, retain and refine patterns in text. Which, given Perl's backronym, it's powerful regex engine, renowned text handling facilities and good database handling, makes it seem (to me) like a problem that Perl is eminently capable of tackling.

Of course, if you have a Neural Net handy, they are designed for exactly this type of 'train the computer to recognise patterns in human heuristics, and then allow them to do it for you' problem.

I briefly worked with an IBM product called "The Integrated Reasoning System" (TIRS) (about which I could find surprisingly little on-line), that was being used to encapsulate the judgments made by human insurance underwriters in arriving at policy costs for "non-standard" insurance risks. This is an infinitely more complex process than deriving questions from a body of text. Having seen, with my own eyes, just how good it became, very quickly, I wouldn't dismiss the rather academic language that most of the papers and articles to do with Neural Nets is couched in too quickly. It maybe tough going at first, but no tougher than the problem that you are trying to solve.

Oh, and good luck:)


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://353874]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-03-29 11:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found