http://www.perlmonks.org?node_id=400612


in reply to NLP - natural language regex-collections?

I looked at all Lingua module docs to find the ones that can be useful in the context of this thread: parsing or generating (english) language constructs.

I have excluded all modules for other languages than english, french, or german.

One module stands out from the others: Lingua::LinkParser is a wrapper for the LINK parser (downloadable here code included), which is a parser written in C, and has an API, which is used by the perl module. I haven't yet used the wrapper but did install the parser itself, and compiled it without problem on win2k with vc6. It has a shell which is easy to get started, and parsing seems very advanced (first impression).

This is a work in progress; I'll continue adding to it, as these and other modules are examined. (Regexp::, Parser::, etc. will follow)

Language level
Lingua::Ident Statistical language identification
Lingua::Identify Language identification
Lingua::Preferred Pick a language based on user's preferences
  
Phrase/sentence/syntax level
Lingua::CollinsParser Head-driven syntactic sentence parser
Lingua::CollinsParser::Node Syntax tree node
Lingua::Conjunction Convert lists into conjunctions
Lingua::EN::Sentence Module for splitting text into sentences.
Lingua::EN::Splitter Split text into words, paragraphs, segments, and tiles
Lingua::EN::Squeeze Shorten english text for Pagers/GSM phones
Lingua::LinkParser Link Grammar Parser by Sleator, Temperley and Lafferty at CMU
Lingua::LinkParser::Definitions Extension providing text definitions for link types
Lingua::LinkParser::Dictionary
Lingua::LinkParser::Linkage
Lingua::LinkParser::Linkage::Sublinkage
Lingua::LinkParser::Linkage::Sublinkage::Link
Lingua::LinkParser::Linkage::Word
Lingua::LinkParser::MatchPath Match paths in linkage diagrams
Lingua::LinkParser::MatchPath::BuildSM
Lingua::LinkParser::MatchPath::Lex
Lingua::LinkParser::MatchPath::Parser
Lingua::LinkParser::MatchPath::SM
Lingua::LinkParser::MatchPath::SMContext
Lingua::LinkParser::Sentence
Lingua::LinkParser::Simple Perl extension for Link Parser - incomplete access to API
Lingua::EN::Segmenter Subdivide texts into passages that represent subtopics
Lingua::EN::Segmenter::Baseline Segment text randomly for baseline purposes
Lingua::EN::Segmenter::Evaluator Evaluate a segmenting method
Lingua::EN::Segmenter::TextTiling Segment text using the TextTiling method
Lingua::EN::Summarize::Filters Helper functions for the Summarize module
Lingua::EN::Summarize A simple tool for summarizing bodies of English text.
Lingua::EN::Summarize::Filters Helper functions for the Summarize module
Lingua::EN::Tagger Part-of-speech tagger for English natural language processing.
  
Word level
Lingua::DE::ASCII Perl extension to convert german umlauts to and from ascii
Lingua::EN::StopWords Typical stop words for an English corpus
Lingua::EN::AddressGrammar grammar tree for Lingua::EN::AddressParse
Lingua::EN::AddressParse Manipulate geographical addresses
Lingua::EN::Dict BETA Version of XML english dictionary storage.
Lingua::EN::Fathom Readability measurements for English text
Lingua::EN::FindNumber Locate (written) numbers in English text
Lingua::EN::Gender Inflect pronouns for gender
Lingua::EN::Hyphenate Syllable based hyphenation
Lingua::EN::Infinitive Find infinitive of a conjugated word
Lingua::EN::Inflect English sing->plur, a/an, nums, participles
Lingua::EN::Inflect::Number Force number of words to singular or plural
Lingua::EN::Keywords Automatically extracts keywords from text
Lingua::EN::Tagger Part-of-speech tagger for English natural language processing.
Lingua::EN::Syllable Estimate syllable count in words
Lingua::EN::VerbTense
Lingua::Ispell Interface to the Ispell spellchecker
Lingua::LA::Stemmer Stemmer for Latin
Lingua::Lexicon::IDP OOP methods for Internet Dictionary Project
  
Human names
Lingua::EN::MatchNames Smart matching for human names
Lingua::EN::Nickname Genealogical nickname matching(Peggy=Midge)
Lingua::EN::NameCase Convert NAMES and names to Correct Case
Lingua::EN::Namegame Converts name to verse as in Name Game song
Lingua::EN::NamedEntity Basic Named Entity Extraction algorithm
Lingua::EN::NameGrammar grammar tree for Lingua::EN::NameParse
Lingua::EN::NameLookup a simple dictionary search and manipulation class.
Lingua::EN::NameParse Manipulate persons name
  
Numbers h
Lingua::31337 P3RL M0DU1E 7O c0NVer7 7ext 7O C0o1 741k
Lingua::DE::Num2Word positive number to text convertor for german. Output
Lingua::DE::Sentence Perl extension for tokenizing german texts into their sentences.
Lingua::EN::Nums2Words
Lingua::EN::Numbers Converts numeric values into their English string equivalents.
Lingua::EN::WordsToNumbers convert numbers written in English to actual numbers
Lingua::EN::Numbers Converts numeric values into their English string equivalents.
Lingua::EN::Numbers::Easy Hash access to Lingua::EN::Numbers objects.
Lingua::EN::Numbers::Ordinate go from cardinal (53) to ordinal (53rd)
Lingua::EN::Numericalize Replaces English descriptions of numbers with numerals
Lingua::EN::Nums2Words
Lingua::EN::Words2Nums convert English text to numbers
Lingua::EN::WordsToNumbers convert numbers written in English to actual numbers
Lingua::FR::Nums2Words Converts numbers to French words
Lingua::Num2Word wrapper for number to text conversion modules of
  
Lingua::Alignment stuff I think it does alignment of two texts in different languages
Lingua::Alignment
Lingua::AlignmentEval
Lingua::AlignmentSet handle a word-aligned bilingual corpus
Lingua::AlignmentSlice
  
Lingua::Features stuff. I think it is a framework for language description (completely 'meta'; no implementation)
Lingua::Features Natural languages features
Lingua::Features::Feature Feature object for Lingua::Features
Lingua::Features::FeatureType FeatureType object for Lingua::Features
Lingua::Features::Library Features library object for Lingua::Features
Lingua::Features::Structure Structure object for Lingua::Features
Lingua::Features::StructureType StructureType object for Lingua::Features
Lingua::Features::Tag Tag object for Lingua::Features
Lingua::Features::Type Type object for Lingua::Features
Lingua::Features::Value Value object for Lingua::Features
  

Other stuff (not useful for above-mentioned purpose):

</tr>
  
Other languages (non UK, FR, DE) *not* English, French, German
Lingua::AF::Numbers Perl module for converting numeric values into their Afrikaans equivalents
Lingua::AM::Abbreviate
Lingua::AR::MacArabic transcode between Mac OS Arabic encoding and Unicode
Lingua::CS::Num2Word number to text convertor for czech. Output
Lingua::AR::MacArabic transcode between Mac OS Arabic encoding and Unicode
Lingua::DetectCyrillic
Lingua::EO::Supersignoj Convert Esperanto characters
Lingua::ES::Silabas Divide una palabra en sE<iacute>labas
Lingua::ES::Numeros Convierte números a texto en Español (Castellano)
Lingua::EU::Numbers Converts numbers into Bask (Euskara).
Lingua::FA::MacFarsi transcode between Mac OS Farsi encoding and Unicode
Lingua::FA::Number Converts English numbers to their Persian (Farsi) HTML/Unicode equivalent
Lingua::FI::Genitive Finnish genitive
Lingua::FI::Hyphenate Finnish hyphenation (suomen tavutus)
Lingua::FI::Inflect Finnish inflect
Lingua::FI::Kontti Finnish Pig Latin (kontinkieli)
Lingua::FI::Transcribe Finnish transcription
Lingua::ID::Nums2Words convert number to Indonesian verbage.
Lingua::ID::Words2Nums convert Indonesian verbage to number.
Lingua::IT::Conjugate Conjugation of Italian verbs
Lingua::IT::Hyphenate Italian word hyphenation
Lingua::IT::Numbers Converts numeric values into their Italian string equivalents
Lingua::IW::Logical module for working with logical and visual hebrew
Lingua::JA::Fold fold a Japanese text.
Lingua::JA::Jcode
Lingua::JA::Jtruncate module to truncate Japanese encoded text.
Lingua::JA::MacJapanese transcoding between Mac OS Japanese and Unicode
Lingua::JA::Mail compose mail with Japanese charset
Lingua::JA::Mail::Header build ISO-2022-JP charset 'B' encoding mail header fields
Lingua::JA::Number Translate numbers into Japanese
Lingua::JA::Regular Regularize of the Japanese character.
Lingua::JA::Regular::Table Conversion Table for Lingua::JA::Regular
Lingua::JA::Regular::Table::Kanji Conversion Table(Kanji) for Lingua::JA::Regular
Lingua::JA::Regular::Table::Macintosh Conversion Table(Macintosh Character) for Lingua::JA::Regular
Lingua::JA::Regular::Table::Windows Conversion Table(Windows Character) for Lingua::JA::Regular
Lingua::JA::Romaji Perl extension for romaji and kana conversion
Lingua::JA::Sort::JIS compare and sort Japanese character strings
Lingua::JA::Sort::ReadableKey Sorting and Romanizing Japanese
Lingua::JP::Kanjidic Parse Jim Breen's kanji dictionary
Lingua::GA::Gramadoir Check the grammar of Irish language text
Lingua::GA::Gramadoir::Languages
Lingua::GA::Gramadoir::Languages::af
Lingua::GA::Gramadoir::Languages::de
Lingua::GA::Gramadoir::Languages::en_us
Lingua::GA::Gramadoir::Languages::fr
Lingua::GA::Gramadoir::Languages::ga
Lingua::GA::Gramadoir::Languages::mn
Lingua::GA::Gramadoir::Languages::nl
Lingua::GA::Gramadoir::Languages::ro
Lingua::GA::Gramadoir::Languages::sk
Lingua::GL::Stemmer Galician language stemming
Lingua::HE::MacHebrew transcode between Mac OS Hebrew encoding and Unicode
Lingua::HE::Sentence Module for splitting Hebrew text into sentences.
Lingua::ID::Words2Nums convert Indonesian verbage to number.
Lingua::NL::Numbers Perl module for converting numeric values into their Dutch equivalents
Lingua::NO::Num2Word convert whole number to norwegian text. Output text is in ISO-8859-1 encoding.
Lingua::KO::Hangul::Util utility functions for Hangul in Unicode
Lingua::KO::MacKorean transcoding between Mac OS Korean and Unicode
Lingua::PL::Numbers Perl module for converting numeric values into their Polish equivalents
Lingua::PT::Abbrev An abbreviations dictionary manager for NLP
Lingua::PT::Conjugate
Lingua::PT::Hyphenate Separates Portuguese words in syllables
Lingua::PT::Infinitives
Lingua::PT::Inflect Portuguese words from singular to plural
Lingua::PT::Nums2Ords Converts numbers to Portuguese ordinals
Lingua::PT::Nums2Words Converts numbers to Portuguese words
Lingua::PT::Ords2Nums Converts Portuguese ordinals to numbers
Lingua::PT::PLN Perl extension for NLP of the Portuguese Language
Lingua::PT::PLN::tokenizer
Lingua::PT::PLNbase Perl extension for NLP of the Portuguese
Lingua::PT::ProperNames Simple module to extract proper names from Portuguese Text
Lingua::PT::Stemmer Portuguese language stemming
Lingua::PT::UnConjugate Recognition of the conjugated forms of
Lingua::PT::VerbSuffixes
Lingua::PT::Words2Nums Converts Portuguese words to numbers
Lingua::RU::Antimat Removes foul language from a Russian string
Lingua::RU::Charset Detect/Convert russian character sets.
Lingua::RU::NameParse Normalize Russian names
Lingua::RU::Number Converts numbers to money sum in words (in Russian roubles)
Lingua::RU::PhTranslit Phonetic correct translit (for Cyrillic)
Lingua::RU::Translit Perl extension for decoding cyrillic translit/volapyuk
Lingua::Shakespeare::Character
Lingua::Sinica::PerlYuYan Use Chinese to write Perl
  
Other usage, phonetics
Lingua::Alphabet::Phonetic map ABC's to phonetic alphabets
Lingua::Alphabet::Phonetic::NATO map ABC's to the NATO phonetic letter names
Lingua::FeatureMatrix Perl extension for configuring groups of
Lingua::FeatureMatrix::Eme Abstract base class contains one single
Lingua::FeatureMatrix::FeatureClass A piece of
Lingua::FeatureMatrix::Implicature Owns a single implicature within
Lingua::Phoneme MySQL-based accent-lookups.
Lingua::Phonology a module providing a unified way to deal with
Lingua::Phonology::Common
Lingua::Phonology::Features a module to handle a set of hierarchical
Lingua::Phonology::Functions
Lingua::Phonology::RuleParser
Lingua::Phonology::Rules a module for defining and applying
Lingua::Phonology::Segment a module to represent a segment as a bundle
Lingua::Phonology::Segment::Boundary
Lingua::Phonology::Segment::Rules
Lingua::Phonology::Segment::Tier
Lingua::Phonology::Syllable
Lingua::Phonology::Symbols a module for associating symbols with
Lingua::Phonology::Word
  
Humor & Nonsense
Acme::Lingua::NIGERIAN WRITE PERL CODE IN NIGERIAN SPAM
Acme::Lingua::Pirate::Perl be writin' thy Perl like a swarthy sea-dog
Acme::Lingua::Strine::Perl make Perl more like Damian
Acme::Scurvy::Whoreson::BilgeRat multi-lingual insult generator
Lingua::Atinlay::Igpay
Lingua::Bork Perl extension for Bork Bork Bork (Assignment-The Enchefalizer)(muppets)
Lingua::En::Victory Perl extension for egotistically expressing victory.
Lingua::Klingon::Collate Sort words in Klingon sort order
Lingua::Klingon::Recode Convert Klingon words between different encodings
Lingua::Klingon::Segment Segment Klingon words into syllables and letters
Lingua::Rhyme MySQL-based rhyme-lookups.
Lingua::Pangram Is this string a pangram
Lingua::Rhyme MySQL-based rhyme-lookups.
Lingua::Rhyme::FindScheme find rhyme schemes in text.
Lingua::Romana::Perligata Perl in Latin
Lingua::Shakespeare Perl in a Shakespeare play
Lingua::Shakespeare::Character
Lingua::Shakespeare::Play
  
//
// searched 19 Oct 2004
// results from http://cpan.uwinnipeg.ca/search?query=Lingua%3A%3A&mode=module
// 200 found.
//
  • Comment on Re: NLP - natural language regex-collections? - Lingua