in reply to natural language sentence construction

Syntax analysis is still an area of active research in linguistics. The language used is often lisp, but Perl is probably also a reasonable choice. Your problem is slightly easier than the corresponding 'grand challenge' problem, which is understanding natural language. In general, it is easier to transmit than to receive.

Tree data structures often used in natural language. Chapter 8 of Mastering Algorithms in Perl is about tree data structures, which are handled by the graph modules.

It looks like there is an AI dictionary that could be helpful in creating a useful syntax tree. You might be able to design a data structure that works like a subset of this dictionary.

Look at Damian Conway's wonderful Coy module, which is a haiku generator which is designed to create human-friendly messages. Although you aren't trying to generate haiku, you may be able to use similar programming techniques.

I think it is justified to attempt to solve such large, open problems, even if the chance of a breakthrough is small. It is a great learning experience, and you just might make a real contribution. You can also think about potential technology advances to look for in the future, so that you might be the first person to apply a newly-available technique to a difficult problem.

It should work perfectly the first time! - toma

  • Comment on Re: natural language sentence construction

Replies are listed 'Best First'.
Re: Re: natural language sentence construction
by thpfft (Chaplain) on Jun 17, 2001 at 22:55 UTC

    Thanks for the links. Especially coy. delightful.

    I agree completely with your quixotic implication: my efforts in this direction so far have proceeded on two fronts: recursive sentence-building routines based on Chomskyan deep structure rules, all of which have failed terribly, and really simple special case routines like the one above, which makes a perfectly readable sentence in a very dull way.

    My background is philosophy of language rather than straight linguistics, but i've got enough of a grasp to see the scale of the problem. I don't think it's completely hopeless, as long as it's properly constrained. The content-management systems that i'm trying to make more articulate are a good place to start: they have a very limited world, their utterances fall into a few well-defined categories, they're almost always declarative, and the goal is transparency, not lyricism.

    to start with, i'd like to identify a small set of phrases that people use all the time. that's why i used the results example above, which everyone must need at some point. Other examples might include pagination links, error messages and confirmation questions, but i'm hoping people will make suggestions. Then i'd like to implement that limited set in as general a way as possible, and take it from there.

    So it's a fairly limited ambition, really. The fact that i'm using it as a spur both to learn OOP properly and finally read the algorithm book should give you some idea of how long it's likely to take :(

    updated: silly typo