Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
It sounds like you want to scan for some simple grammatical constructs, like maybe subject-verb-object, etc. Maybe the above links can help you. This is a field where you can get sucked deeper and deeper which is great if you are interested in it. Though I am not a computational linguist by a very long shot, it sounded like you might want to start with a tagger so you can tell what parts of speech you have, also head driven parsers are gaining a lot of attention. There are now a lot more linguistic resources in CPAN than there were just months ago.

You might like to check out The GATE Project at the University of Sheffield's natural language processing group.

(GATE = General Architecture for Text Engineering)

also resource lists from Statistical NLP at Stanford U., Tokushima U., and the NL Software Registry. You will find lots of links if you spend time searching for the phrase in quotes, "Natural Language Processing". or maybe "Information Extraction". Just searching for NLP or IE will not be so useful.

Incidentally, I don't know if this will help you but if you read the GATE Guide (i.e. the Tao of Gate book), you may find interesting the chapters on the ANNIE information extraction engine and JAPE ("JAPE allows you to recognise regular expressions in annotations on documents"). It likes Java though, if anyone knows about GATE usage with Perl I'm interested in hearing about it.

How about reporting back on how your work goes?

In reply to Re: NLP - natural language regex-collections? by mattr
in thread NLP - natural language regex-collections? by erix

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others meditating upon the Monastery: (3)
    As of 2021-05-07 06:13 GMT
    Find Nodes?
      Voting Booth?
      Perl 7 will be out ...

      Results (87 votes). Check out past polls.