Natural Language Sentence Production

japhy has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Natural Language Sentence Production by ikegami (Patriarch) on Sep 22, 2005 at 21:19 UTC
Bone::Easy does just that with the rules in Bone::Easy::Rules	[reply]
Re^2: Natural Language Sentence Production by japhy (Canon) on Sep 22, 2005 at 21:31 UTC
Ah yes, I remember that module. Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply]
Re: Natural Language Sentence Production by GrandFather (Saint) on Sep 22, 2005 at 21:23 UTC
Haven't you just pretty much described the algorythm? Build tables of words for the various parts of speech, then use templates to select the table to use to select the word for each position in the sentence. If you want to get clever, allow the template to be defined recursively. A Super Search may help. If you have absolutely no idea how to go about this, Re: Perl can do it, take 1 (sentence generation) may help. Perl is Huffman encoded by design.	[reply]
Re^2: Natural Language Sentence Production by japhy (Canon) on Sep 22, 2005 at 21:30 UTC
Ah, that code is pretty much what I envisioned doing before I considered using a language parsing module. I guess I'll go that route. Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply]
Re: Natural Language Sentence Production by merlyn (Sage) on Sep 22, 2005 at 22:58 UTC
Perhaps you can use my Spew language, which was CPANed as Inline::Spew. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re^2: Natural Language Sentence Production by tomazos (Deacon) on Sep 23, 2005 at 01:09 UTC
Interestingly there is a tool that does exactly the same thing as spew - except with graphics. The terminals are shapes (SQUARE, CIRCLE, etc) rather than character sequences. It picks a random weighted path through the grammar to draw the drawing. Check it out here: Context Free Design Grammar Some of the output is downright spooky -Andrew. Andrew Tomazos \| andrew@tomazos.com \| www.tomazos.com	[reply]
Re: Natural Language Sentence Production by Nitsuj (Hermit) on Sep 22, 2005 at 22:19 UTC
So, there's a lot of literature on natural language... Assuming that you want a bit quicker ramp-up than a few months of reading, I'd do a search on citeseer for data on this, since someone has probably written a paper. Failing that, you could scan corpora for sentences that contain the words on the magnets, or you could generate a model from corpora (such as a bayesian one), then constrain your model to only sentences containing those words, and choose sentences with the highest probability.	[reply]
Re^2: Natural Language Sentence Production by Camel_thirst (Friar) on Sep 23, 2005 at 10:56 UTC
I agree with Nitsuj and I'll do a search on Scirus as well... First in sentence generation, usually there is a distinction between 'syntax'/'grammar' and 'semantics'. First constraint is that your auto-generated sentences are grammatically correct, second is that they actually mean something sensible. From what I gather from your post, you only need them to be grammatically correct, not meaningful. In fact, your initial corpus (magnet elements) need to be mapped (using XML for instance) so each element is described by several attributes (noun/adjective/verb, singular/plural, or even genre if the language you are using makes the distinction between masc./fem./neutr...) Some words, especially prepositions, or prepositional verbs, can create a lot of trouble as to their linkage with other elements. You'll find such grammar descriptive models in publications,for example here : Yet Another Head Driven Generator of Natural Language EJ	[reply]
Re: Natural Language Sentence Production by sauoq (Abbot) on Sep 22, 2005 at 22:30 UTC
I don't think this will be any help to you, but I did something like this in lisp once. Only, it worked interactively, the vocabulary was built up from input sentences, and generation rules were generated essentially by building tables describing word proximity in input sentences. I called it "sputter". It was originally based on a program called "henley" which appeared in ANSI Common Lisp by Paul Graham which is a really good book if you want to learn Lisp. Come to think of it, it was nothing like what you describe. It was a whole lot of fun though. -sauoq "My two cents aren't worth a dime.";	[reply]
Re: Natural Language Sentence Production by newroz (Monk) on Sep 23, 2005 at 06:39 UTC
Sean Burkes' Chomsky Bot. It shows how to construct sentences and paragraphs from provided chunks of grammatical elements.	[reply]
Re: Natural Language Sentence Production by quinkan (Monk) on Sep 23, 2005 at 02:48 UTC
Well, there's Dev::Bollocks.... Oh. You wanted sensicals. Pity. It gives you management language. Completely different thing.	[reply]
Re: Natural Language Sentence Production by DrHyde (Prior) on Sep 23, 2005 at 09:16 UTC
Take a look at how I do that in Acme::Scurvy::Whoreson::BilgeRat.	[reply]
Re: Natural Language Sentence Production by Anonymous Monk on Sep 22, 2005 at 21:10 UTC
I know there are natural language parsing modules on CPAN, but I want to do the opposite The easiest solution? Guess and check. Generate a sentence of random words and reject those which don't pass muster accoring to you CPAN parsing modules.	[reply]
Re^2: Natural Language Sentence Production by tomazos (Deacon) on Sep 23, 2005 at 00:59 UTC
The easiest solution? Guess and check. Generate a sentence of random words and reject those which don't pass mu Sounds like a bogosort to me. To sort a deck of cards: Shuffle deck randomly. If not sorted, go to 1. Efficiency in the order of O(lots and lots and lots and....) -Andrew. Andrew Tomazos \| andrew@tomazos.com \| www.tomazos.com	[reply]
Re^3: Natural Language Sentence Production by spiritway (Vicar) on Sep 23, 2005 at 05:39 UTC
Close, but not quite. It depends on the ratio of "sensible" vs. "nonsensical" phrases, which in turn depends on how much leeway you're willing to give the program. If you want to mimic a rational human, then yes, you're right. But if you're willing to take something less stringent - say, your average irrational person, or worse, a suit - then you might be able to pull it off this way. I did something like this with a music-composing program. I randomly generated tones, intervals, and durations, and filtered out the obvious dissonances and silly combinations. What was left was OK - not exactly music, but close to Musak. It would have passed muster in an elevator, but not at a concert.	[reply]
Re^2: Natural Language Sentence Production by Anonymous Monk on Sep 22, 2005 at 21:25 UTC
Er. That probably could have been more clear. Don't generate a long random sentance and then check it, build it up a piece at a time. `while(...) { my @sentence = (); for (1..$random_sentence_length) { do { $next_word = random_word_generator(); } while(not grammar_correct(@sentance, $next_word); push @sentance, $next_word; } print @sentance; }` [download]	[reply] [d/l]
Re^2: Natural Language Sentence Production by Excalibor (Pilgrim) on Sep 23, 2005 at 08:19 UTC
Check out Dave "Pragmatic" Thomas's blog entry Kata14 where you'll be challenged to generate sensical text by using "trigrams"... I've tried his solution, and with a varied enough text base to feed it, it generates some impressive texts with so little effort... I guess you can improve it to check longer strings as well, but you'd end up with a Markov Chain (de\|re)generator, which could be interesting on itself, but... (suggestion: the Gutenberg Project is a great place to get good base texts for feeding the database :-) Link: PragDave's Kata Fourteen Good luck, -- `our $Perl6 is Fantastic;`	[reply] [d/l]
Re: Natural Language Sentence Production by mattr (Curate) on Sep 24, 2005 at 18:39 UTC
You might like to search cpan for Lingua (like Lingua::En::Inflect)and WordNet (like WordNet::SenseRelate::Tools or WordNet::Similarity). The field itself is pretty large, it is natural language processing (NLP), or computational linguistics, and you are talking about "sentence generation". But it sounds like you don't really want to get that deeply into it. If you are careful to limit what can be selected into each field it may sound realistic. Incidentally you might be interested in ALICE.	[reply]
Re: Natural Language Sentence Production by casiano (Pilgrim) on Jan 14, 2009 at 09:58 UTC
Just in case it can help to people trying to solve a similar problem: Probably yagg is the righ tool for that. Though Parse::Eyapp was conceived for parsing, versions 1.137 and later provide support to build a phrase generator from a grammar specification. If you want to know more, read the tutorial Parse::Eyapp:::datagenerationtut. The example used produces sequences of assignment statements: `Parse-Eyapp/examples/generator$ ./Generator.pm # result: -710.2 I=(3-8+7/5); R=2+8I4+52+I/I` [download] To specify the language we write a yacc-like grammar, but instead of writing the classical lexer, i. e. scanning the input to produce the next token, we write a token generator: Each time our lexical analyzer is called, it checks the list of expected tokens (available via the method YYExpect) and produces - following some probability distribution - one of them. This is the grammar for the calculator: Parse-Eyapp/examples/generator$ cat -n Generator.eyp 1 # file: Generator.eyp 2 # compile with: eyapp -b '' Generator.eyp 3 # then run: ./Generator.pm 4 %strict 5 %token NUM VARDEF VAR 6 7 %right '=' 8 %left '-' '+' 9 %left '' '/' 10 %left NEG 11 %right '^' 12 13 %defaultaction { 14 my $parser = shift; 15 16 return join '', @_; 17 } 18 19 %{ 20 use base q{Parse::Eyapp::TokenGen}; 21 use base q{GenSupport}; 22 %} 23 24 %% 25 26 stmts: 27 stmt 28 { # At least one variable is defined now 29 $_[0]->deltaweight(VAR => +1); 30 $_[1]; 31 } 32 \| stmts ';' { "\n" } stmt 33 ; 34 35 stmt: 36 VARDEF '=' exp 37 { 38 my $parser = shift; 39 $parser->defined_variable($_[0]); 40 "$_[0]=$_[2]"; 41 } 42 ; 43 exp: 44 NUM 45 \| VAR 46 \| exp '+' exp 47 \| exp '-' exp 48 \| exp '' exp 49 \| exp '/' exp 50 \| '-' { $_[0]->pushdeltaweight('-' => -1) } 51 exp %prec NEG { 52 $_[0]->popweight(); 53 "-$_[3]" 54 } 55 \| exp '^' exp 56 \| '(' { $_[0]->pushdeltaweight( '(' => -1, ')' => +1, '+' => +1, ); } 57 exp 58 ')' 59 { 60 $_[0]->popweight; 61 "($_[3])" 62 } 63 ; 64 65 %% 66 67 unless (caller) { 68 __PACKAGE__->main(@ARGV); 69 } [download] The difficult part is the management of the probability distribution to produce reasonable phrases and to avoid very long statements. The generation of tokens and its attributes uses Test::LectroTest::Generator. The support subroutines have been isolated in the module GenSupport.pm (see http://cpansearch.perl.org/src/CASIANO/Parse-Eyapp-1.137/examples/generator/GenSupport.pm ).	[reply] [d/l] [select]


Perl Monk, Perl Meditation
	PerlMonks