Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

I looked at all Lingua module docs to find the ones that can be useful in the context of this thread: parsing or generating (english) language constructs.

I have excluded all modules for other languages than english, french, or german.

One module stands out from the others: Lingua::LinkParser is a wrapper for the LINK parser (downloadable here code included), which is a parser written in C, and has an API, which is used by the perl module. I haven't yet used the wrapper but did install the parser itself, and compiled it without problem on win2k with vc6. It has a shell which is easy to get started, and parsing seems very advanced (first impression).

This is a work in progress; I'll continue adding to it, as these and other modules are examined. (Regexp::, Parser::, etc. will follow)

Language level
Lingua::Ident Statistical language identification
Lingua::Identify Language identification
Lingua::Preferred Pick a language based on user's preferences
  
Phrase/sentence/syntax level
Lingua::CollinsParser Head-driven syntactic sentence parser
Lingua::CollinsParser::Node Syntax tree node
Lingua::Conjunction Convert lists into conjunctions
Lingua::EN::Sentence Module for splitting text into sentences.
Lingua::EN::Splitter Split text into words, paragraphs, segments, and tiles
Lingua::EN::Squeeze Shorten english text for Pagers/GSM phones
Lingua::LinkParser Link Grammar Parser by Sleator, Temperley and Lafferty at CMU
Lingua::LinkParser::Definitions Extension providing text definitions for link types
Lingua::LinkParser::Dictionary
Lingua::LinkParser::Linkage
Lingua::LinkParser::Linkage::Sublinkage
Lingua::LinkParser::Linkage::Sublinkage::Link
Lingua::LinkParser::Linkage::Word
Lingua::LinkParser::MatchPath Match paths in linkage diagrams
Lingua::LinkParser::MatchPath::BuildSM
Lingua::LinkParser::MatchPath::Lex
Lingua::LinkParser::MatchPath::Parser
Lingua::LinkParser::MatchPath::SM
Lingua::LinkParser::MatchPath::SMContext
Lingua::LinkParser::Sentence
Lingua::LinkParser::Simple Perl extension for Link Parser - incomplete access to API
Lingua::EN::Segmenter Subdivide texts into passages that represent subtopics
Lingua::EN::Segmenter::Baseline Segment text randomly for baseline purposes
Lingua::EN::Segmenter::Evaluator Evaluate a segmenting method
Lingua::EN::Segmenter::TextTiling Segment text using the TextTiling method
Lingua::EN::Summarize::Filters Helper functions for the Summarize module
Lingua::EN::Summarize A simple tool for summarizing bodies of English text.
Lingua::EN::Summarize::Filters Helper functions for the Summarize module
Lingua::EN::Tagger Part-of-speech tagger for English natural language processing.
  
Word level
Lingua::DE::ASCII Perl extension to convert german umlauts to and from ascii
Lingua::EN::StopWords Typical stop words for an English corpus
Lingua::EN::AddressGrammar grammar tree for Lingua::EN::AddressParse
Lingua::EN::AddressParse Manipulate geographical addresses
Lingua::EN::Dict BETA Version of XML english dictionary storage.
Lingua::EN::Fathom Readability measurements for English text
Lingua::EN::FindNumber Locate (written) numbers in English text
Lingua::EN::Gender Inflect pronouns for gender
Lingua::EN::Hyphenate Syllable based hyphenation
Lingua::EN::Infinitive Find infinitive of a conjugated word
Lingua::EN::Inflect English sing->plur, a/an, nums, participles
Lingua::EN::Inflect::Number Force number of words to singular or plural
Lingua::EN::Keywords Automatically extracts keywords from text
Lingua::EN::Tagger Part-of-speech tagger for English natural language processing.
Lingua::EN::Syllable Estimate syllable count in words
Lingua::EN::VerbTense
Lingua::Ispell Interface to the Ispell spellchecker
Lingua::LA::Stemmer Stemmer for Latin
Lingua::Lexicon::IDP OOP methods for Internet Dictionary Project
  
Human names
Lingua::EN::MatchNames Smart matching for human names
Lingua::EN::Nickname Genealogical nickname matching(Peggy=Midge)
Lingua::EN::NameCase Convert NAMES and names to Correct Case
Lingua::EN::Namegame Converts name to verse as in Name Game song
Lingua::EN::NamedEntity Basic Named Entity Extraction algorithm
Lingua::EN::NameGrammar grammar tree for Lingua::EN::NameParse
Lingua::EN::NameLookup a simple dictionary search and manipulation class.
Lingua::EN::NameParse Manipulate persons name
  
Numbers h
Lingua::31337 P3RL M0DU1E 7O c0NVer7 7ext 7O C0o1 741k
Lingua::DE::Num2Word positive number to text convertor for german. Output
Lingua::DE::Sentence Perl extension for tokenizing german texts into their sentences.
Lingua::EN::Nums2Words
Lingua::EN::Numbers Converts numeric values into their English string equivalents.
Lingua::EN::WordsToNumbers convert numbers written in English to actual numbers
Lingua::EN::Numbers Converts numeric values into their English string equivalents.
Lingua::EN::Numbers::Easy Hash access to Lingua::EN::Numbers objects.
Lingua::EN::Numbers::Ordinate go from cardinal (53) to ordinal (53rd)
Lingua::EN::Numericalize Replaces English descriptions of numbers with numerals
Lingua::EN::Nums2Words
Lingua::EN::Words2Nums convert English text to numbers
Lingua::EN::WordsToNumbers convert numbers written in English to actual numbers
Lingua::FR::Nums2Words Converts numbers to French words
Lingua::Num2Word wrapper for number to text conversion modules of
  
Lingua::Alignment stuff I think it does alignment of two texts in different languages
Lingua::Alignment
Lingua::AlignmentEval
Lingua::AlignmentSet handle a word-aligned bilingual corpus
Lingua::AlignmentSlice
  
Lingua::Features stuff. I think it is a framework for language description (completely 'meta'; no implementation)
Lingua::Features Natural languages features
Lingua::Features::Feature Feature object for Lingua::Features
Lingua::Features::FeatureType FeatureType object for Lingua::Features
Lingua::Features::Library Features library object for Lingua::Features
Lingua::Features::Structure Structure object for Lingua::Features
Lingua::Features::StructureType StructureType object for Lingua::Features
Lingua::Features::Tag Tag object for Lingua::Features
Lingua::Features::Type Type object for Lingua::Features
Lingua::Features::Value Value object for Lingua::Features
  

Other stuff (not useful for above-mentioned purpose):

</tr>
  
Other languages (non UK, FR, DE) *not* English, French, German
Lingua::AF::Numbers Perl module for converting numeric values into their Afrikaans equivalents
Lingua::AM::Abbreviate
Lingua::AR::MacArabic transcode between Mac OS Arabic encoding and Unicode
Lingua::CS::Num2Word number to text convertor for czech. Output
Lingua::AR::MacArabic transcode between Mac OS Arabic encoding and Unicode
Lingua::DetectCyrillic
Lingua::EO::Supersignoj Convert Esperanto characters
Lingua::ES::Silabas Divide una palabra en sE<iacute>labas
Lingua::ES::Numeros Convierte números a texto en Español (Castellano)
Lingua::EU::Numbers Converts numbers into Bask (Euskara).
Lingua::FA::MacFarsi transcode between Mac OS Farsi encoding and Unicode
Lingua::FA::Number Converts English numbers to their Persian (Farsi) HTML/Unicode equivalent
Lingua::FI::Genitive Finnish genitive
Lingua::FI::Hyphenate Finnish hyphenation (suomen tavutus)
Lingua::FI::Inflect Finnish inflect
Lingua::FI::Kontti Finnish Pig Latin (kontinkieli)
Lingua::FI::Transcribe Finnish transcription
Lingua::ID::Nums2Words convert number to Indonesian verbage.
Lingua::ID::Words2Nums convert Indonesian verbage to number.
Lingua::IT::Conjugate Conjugation of Italian verbs
Lingua::IT::Hyphenate Italian word hyphenation
Lingua::IT::Numbers Converts numeric values into their Italian string equivalents
Lingua::IW::Logical module for working with logical and visual hebrew
Lingua::JA::Fold fold a Japanese text.
Lingua::JA::Jcode
Lingua::JA::Jtruncate module to truncate Japanese encoded text.
Lingua::JA::MacJapanese transcoding between Mac OS Japanese and Unicode
Lingua::JA::Mail compose mail with Japanese charset
Lingua::JA::Mail::Header build ISO-2022-JP charset 'B' encoding mail header fields
Lingua::JA::Number Translate numbers into Japanese
Lingua::JA::Regular Regularize of the Japanese character.
Lingua::JA::Regular::Table Conversion Table for Lingua::JA::Regular
Lingua::JA::Regular::Table::Kanji Conversion Table(Kanji) for Lingua::JA::Regular
Lingua::JA::Regular::Table::Macintosh Conversion Table(Macintosh Character) for Lingua::JA::Regular
Lingua::JA::Regular::Table::Windows Conversion Table(Windows Character) for Lingua::JA::Regular
Lingua::JA::Romaji Perl extension for romaji and kana conversion
Lingua::JA::Sort::JIS compare and sort Japanese character strings
Lingua::JA::Sort::ReadableKey Sorting and Romanizing Japanese
Lingua::JP::Kanjidic Parse Jim Breen's kanji dictionary
Lingua::GA::Gramadoir Check the grammar of Irish language text
Lingua::GA::Gramadoir::Languages
Lingua::GA::Gramadoir::Languages::af
Lingua::GA::Gramadoir::Languages::de
Lingua::GA::Gramadoir::Languages::en_us
Lingua::GA::Gramadoir::Languages::fr
Lingua::GA::Gramadoir::Languages::ga
Lingua::GA::Gramadoir::Languages::mn
Lingua::GA::Gramadoir::Languages::nl
Lingua::GA::Gramadoir::Languages::ro
Lingua::GA::Gramadoir::Languages::sk
Lingua::GL::Stemmer Galician language stemming
Lingua::HE::MacHebrew transcode between Mac OS Hebrew encoding and Unicode
Lingua::HE::Sentence Module for splitting Hebrew text into sentences.
Lingua::ID::Words2Nums convert Indonesian verbage to number.
Lingua::NL::Numbers Perl module for converting numeric values into their Dutch equivalents
Lingua::NO::Num2Word convert whole number to norwegian text. Output text is in ISO-8859-1 encoding.
Lingua::KO::Hangul::Util utility functions for Hangul in Unicode
Lingua::KO::MacKorean transcoding between Mac OS Korean and Unicode
Lingua::PL::Numbers Perl module for converting numeric values into their Polish equivalents
Lingua::PT::Abbrev An abbreviations dictionary manager for NLP
Lingua::PT::Conjugate
Lingua::PT::Hyphenate Separates Portuguese words in syllables
Lingua::PT::Infinitives
Lingua::PT::Inflect Portuguese words from singular to plural
Lingua::PT::Nums2Ords Converts numbers to Portuguese ordinals
Lingua::PT::Nums2Words Converts numbers to Portuguese words
Lingua::PT::Ords2Nums Converts Portuguese ordinals to numbers
Lingua::PT::PLN Perl extension for NLP of the Portuguese Language
Lingua::PT::PLN::tokenizer
Lingua::PT::PLNbase Perl extension for NLP of the Portuguese
Lingua::PT::ProperNames Simple module to extract proper names from Portuguese Text
Lingua::PT::Stemmer Portuguese language stemming
Lingua::PT::UnConjugate Recognition of the conjugated forms of
Lingua::PT::VerbSuffixes
Lingua::PT::Words2Nums Converts Portuguese words to numbers
Lingua::RU::Antimat Removes foul language from a Russian string
Lingua::RU::Charset Detect/Convert russian character sets.
Lingua::RU::NameParse Normalize Russian names
Lingua::RU::Number Converts numbers to money sum in words (in Russian roubles)
Lingua::RU::PhTranslit Phonetic correct translit (for Cyrillic)
Lingua::RU::Translit Perl extension for decoding cyrillic translit/volapyuk
Lingua::Shakespeare::Character
Lingua::Sinica::PerlYuYan Use Chinese to write Perl
  
Other usage, phonetics
Lingua::Alphabet::Phonetic map ABC's to phonetic alphabets
Lingua::Alphabet::Phonetic::NATO map ABC's to the NATO phonetic letter names
Lingua::FeatureMatrix Perl extension for configuring groups of
Lingua::FeatureMatrix::Eme Abstract base class contains one single
Lingua::FeatureMatrix::FeatureClass A piece of
Lingua::FeatureMatrix::Implicature Owns a single implicature within
Lingua::Phoneme MySQL-based accent-lookups.
Lingua::Phonology a module providing a unified way to deal with
Lingua::Phonology::Common
Lingua::Phonology::Features a module to handle a set of hierarchical
Lingua::Phonology::Functions
Lingua::Phonology::RuleParser
Lingua::Phonology::Rules a module for defining and applying
Lingua::Phonology::Segment a module to represent a segment as a bundle
Lingua::Phonology::Segment::Boundary
Lingua::Phonology::Segment::Rules
Lingua::Phonology::Segment::Tier
Lingua::Phonology::Syllable
Lingua::Phonology::Symbols a module for associating symbols with
Lingua::Phonology::Word
  
Humor & Nonsense
Acme::Lingua::NIGERIAN WRITE PERL CODE IN NIGERIAN SPAM
Acme::Lingua::Pirate::Perl be writin' thy Perl like a swarthy sea-dog
Acme::Lingua::Strine::Perl make Perl more like Damian
Acme::Scurvy::Whoreson::BilgeRat multi-lingual insult generator
Lingua::Atinlay::Igpay
Lingua::Bork Perl extension for Bork Bork Bork (Assignment-The Enchefalizer)(muppets)
Lingua::En::Victory Perl extension for egotistically expressing victory.
Lingua::Klingon::Collate Sort words in Klingon sort order
Lingua::Klingon::Recode Convert Klingon words between different encodings
Lingua::Klingon::Segment Segment Klingon words into syllables and letters
Lingua::Rhyme MySQL-based rhyme-lookups.
Lingua::Pangram Is this string a pangram
Lingua::Rhyme MySQL-based rhyme-lookups.
Lingua::Rhyme::FindScheme find rhyme schemes in text.
Lingua::Romana::Perligata Perl in Latin
Lingua::Shakespeare Perl in a Shakespeare play
Lingua::Shakespeare::Character
Lingua::Shakespeare::Play
  
//
// searched 19 Oct 2004
// results from http://cpan.uwinnipeg.ca/search?query=Lingua%3A%3A&mode=module
// 200 found.
//

In reply to Re: NLP - natural language regex-collections? - Lingua by erix
in thread NLP - natural language regex-collections? by erix

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others making s'mores by the fire in the courtyard of the Monastery: (6)
    As of 2015-07-02 23:18 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









      Results (47 votes), past polls