Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
It sounds like you want to scan for some simple grammatical constructs, like maybe subject-verb-object, etc. Maybe the above links can help you. This is a field where you can get sucked deeper and deeper which is great if you are interested in it. Though I am not a computational linguist by a very long shot, it sounded like you might want to start with a tagger so you can tell what parts of speech you have, also head driven parsers are gaining a lot of attention. There are now a lot more linguistic resources in CPAN than there were just months ago.

You might like to check out The GATE Project at the University of Sheffield's natural language processing group.

(GATE = General Architecture for Text Engineering)

also resource lists from Statistical NLP at Stanford U., Tokushima U., and the NL Software Registry. You will find lots of links if you spend time searching for the phrase in quotes, "Natural Language Processing". or maybe "Information Extraction". Just searching for NLP or IE will not be so useful.

Incidentally, I don't know if this will help you but if you read the GATE Guide (i.e. the Tao of Gate book), you may find interesting the chapters on the ANNIE information extraction engine and JAPE ("JAPE allows you to recognise regular expressions in annotations on documents"). It likes Java though, if anyone knows about GATE usage with Perl I'm interested in hearing about it.

How about reporting back on how your work goes?


In reply to Re: NLP - natural language regex-collections? by mattr
in thread NLP - natural language regex-collections? by erix

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2021-06-22 07:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)












    Results (101 votes). Check out past polls.

    Notices?