in reply to NLP - natural language regex-collections?

It sounds like you want to scan for some simple grammatical constructs, like maybe subject-verb-object, etc. Maybe the above links can help you. This is a field where you can get sucked deeper and deeper which is great if you are interested in it. Though I am not a computational linguist by a very long shot, it sounded like you might want to start with a tagger so you can tell what parts of speech you have, also head driven parsers are gaining a lot of attention. There are now a lot more linguistic resources in CPAN than there were just months ago.

You might like to check out The GATE Project at the University of Sheffield's natural language processing group.

(GATE = General Architecture for Text Engineering)

also resource lists from Statistical NLP at Stanford U., Tokushima U., and the NL Software Registry. You will find lots of links if you spend time searching for the phrase in quotes, "Natural Language Processing". or maybe "Information Extraction". Just searching for NLP or IE will not be so useful.

Incidentally, I don't know if this will help you but if you read the GATE Guide (i.e. the Tao of Gate book), you may find interesting the chapters on the ANNIE information extraction engine and JAPE ("JAPE allows you to recognise regular expressions in annotations on documents"). It likes Java though, if anyone knows about GATE usage with Perl I'm interested in hearing about it.

How about reporting back on how your work goes?

  • Comment on Re: NLP - natural language regex-collections?