Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Hi, monks, yeanling & bantling:

After looking over CPAN and doing some WAITing, I'm wondering if I missed out on the big blinking sign that read "English stuff is here!", or something. Is there a unified module that offers a wide variety of English primitives and transforms for Natural Language Processing? For instance, is there something like Text::English, but more extensive? If not, I may be interested in adding my code to somewhere appropriate in the tree as a starting point. So far no word from the author of Text::English.

I'm doing a small contract that requires some auto-correlation and such...

Text::English::stem has been invaluable. Thanks Martin Porter, implementors, and others! I've also been thinking of taking advantage of some of the lists at to hammer out some facilities for future English nightmares.

On an unrelated note, did you know that only a few special places on the web have the following word sequence according to Google: "Bring King Ling ring Bing Ding Sing spring swing" (Wow, The Phonosemantics of Nasal-Stop Clusters and other music hits.). Can you think of the longest such a m/[a-z]+ing/ match which presumably will trip up Porter's Stemmer (where length > 5)? The common thing here is that the ugly duckling word isn't a stemmable -ing string where that is suppose to cling unlike the word 'spelling'.

Please help me find wordlists that detail English word relationships or other cool language algorithms (I'm no linguist). Thanks my darlings... (And don't go flinging your dumplings at the poor cageling! =] )

P.S. See: Martin's Official PorterStemmer page, for more info on stemmers.

In reply to Status of English modules... by darksym

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (6)
As of 2021-07-30 12:38 GMT
Find Nodes?
    Voting Booth?

    No recent polls found