Re: NLP - natural language regex-collections?

in reply to NLP - natural language regex-collections?

I looked at all Lingua module docs to find the ones that can be useful in the context of this thread: parsing or generating (english) language constructs.

I have excluded all modules for other languages than english, french, or german.

One module stands out from the others: Lingua::LinkParser is a wrapper for the LINK parser (downloadable here code included), which is a parser written in C, and has an API, which is used by the perl module. I haven't yet used the wrapper but did install the parser itself, and compiled it without problem on win2k with vc6. It has a shell which is easy to get started, and parsing seems very advanced (first impression).

This is a work in progress; I'll continue adding to it, as these and other modules are examined. (Regexp::, Parser::, etc. will follow)

Other stuff (not useful for above-mentioned purpose):

</tr>


Other languages (non UK, FR, DE)	not English, French, German
Lingua::AF::Numbers	Perl module for converting numeric values into their Afrikaans equivalents
Lingua::AM::Abbreviate
Lingua::AR::MacArabic	transcode between Mac OS Arabic encoding and Unicode
Lingua::CS::Num2Word	number to text convertor for czech. Output
Lingua::AR::MacArabic	transcode between Mac OS Arabic encoding and Unicode
Lingua::DetectCyrillic
Lingua::EO::Supersignoj	Convert Esperanto characters
Lingua::ES::Silabas	Divide una palabra en sE<iacute>labas
Lingua::ES::Numeros	Convierte n�meros a texto en Espa�ol (Castellano)
Lingua::EU::Numbers	Converts numbers into Bask (Euskara).
Lingua::FA::MacFarsi	transcode between Mac OS Farsi encoding and Unicode
Lingua::FA::Number	Converts English numbers to their Persian (Farsi) HTML/Unicode equivalent
Lingua::FI::Genitive	Finnish genitive
Lingua::FI::Hyphenate	Finnish hyphenation (suomen tavutus)
Lingua::FI::Inflect	Finnish inflect
Lingua::FI::Kontti	Finnish Pig Latin (kontinkieli)
Lingua::FI::Transcribe	Finnish transcription
Lingua::ID::Nums2Words	convert number to Indonesian verbage.
Lingua::ID::Words2Nums	convert Indonesian verbage to number.
Lingua::IT::Conjugate	Conjugation of Italian verbs
Lingua::IT::Hyphenate	Italian word hyphenation
Lingua::IT::Numbers	Converts numeric values into their Italian string equivalents
Lingua::IW::Logical	module for working with logical and visual hebrew
Lingua::JA::Fold	fold a Japanese text.
Lingua::JA::Jcode
Lingua::JA::Jtruncate	module to truncate Japanese encoded text.
Lingua::JA::MacJapanese	transcoding between Mac OS Japanese and Unicode
Lingua::JA::Mail	compose mail with Japanese charset
Lingua::JA::Mail::Header	build ISO-2022-JP charset 'B' encoding mail header fields
Lingua::JA::Number	Translate numbers into Japanese
Lingua::JA::Regular	Regularize of the Japanese character.
Lingua::JA::Regular::Table	Conversion Table for Lingua::JA::Regular
Lingua::JA::Regular::Table::Kanji	Conversion Table(Kanji) for Lingua::JA::Regular
Lingua::JA::Regular::Table::Macintosh	Conversion Table(Macintosh Character) for Lingua::JA::Regular
Lingua::JA::Regular::Table::Windows	Conversion Table(Windows Character) for Lingua::JA::Regular
Lingua::JA::Romaji	Perl extension for romaji and kana conversion
Lingua::JA::Sort::JIS	compare and sort Japanese character strings
Lingua::JA::Sort::ReadableKey	Sorting and Romanizing Japanese
Lingua::JP::Kanjidic	Parse Jim Breen's kanji dictionary
Lingua::GA::Gramadoir	Check the grammar of Irish language text
Lingua::GA::Gramadoir::Languages
Lingua::GA::Gramadoir::Languages::af
Lingua::GA::Gramadoir::Languages::de
Lingua::GA::Gramadoir::Languages::en_us
Lingua::GA::Gramadoir::Languages::fr
Lingua::GA::Gramadoir::Languages::ga
Lingua::GA::Gramadoir::Languages::mn
Lingua::GA::Gramadoir::Languages::nl
Lingua::GA::Gramadoir::Languages::ro
Lingua::GA::Gramadoir::Languages::sk
Lingua::GL::Stemmer	Galician language stemming
Lingua::HE::MacHebrew	transcode between Mac OS Hebrew encoding and Unicode
Lingua::HE::Sentence	Module for splitting Hebrew text into sentences.
Lingua::ID::Words2Nums	convert Indonesian verbage to number.
Lingua::NL::Numbers	Perl module for converting numeric values into their Dutch equivalents
Lingua::NO::Num2Word	convert whole number to norwegian text. Output text is in ISO-8859-1 encoding.
Lingua::KO::Hangul::Util	utility functions for Hangul in Unicode
Lingua::KO::MacKorean	transcoding between Mac OS Korean and Unicode
Lingua::PL::Numbers	Perl module for converting numeric values into their Polish equivalents
Lingua::PT::Abbrev	An abbreviations dictionary manager for NLP
Lingua::PT::Conjugate
Lingua::PT::Hyphenate	Separates Portuguese words in syllables
Lingua::PT::Infinitives
Lingua::PT::Inflect	Portuguese words from singular to plural
Lingua::PT::Nums2Ords	Converts numbers to Portuguese ordinals
Lingua::PT::Nums2Words	Converts numbers to Portuguese words
Lingua::PT::Ords2Nums	Converts Portuguese ordinals to numbers
Lingua::PT::PLN	Perl extension for NLP of the Portuguese Language
Lingua::PT::PLN::tokenizer
Lingua::PT::PLNbase	Perl extension for NLP of the Portuguese
Lingua::PT::ProperNames	Simple module to extract proper names from Portuguese Text
Lingua::PT::Stemmer	Portuguese language stemming
Lingua::PT::UnConjugate	Recognition of the conjugated forms of
Lingua::PT::VerbSuffixes
Lingua::PT::Words2Nums	Converts Portuguese words to numbers
Lingua::RU::Antimat	Removes foul language from a Russian string
Lingua::RU::Charset	Detect/Convert russian character sets.
Lingua::RU::NameParse	Normalize Russian names
Lingua::RU::Number	Converts numbers to money sum in words (in Russian roubles)
Lingua::RU::PhTranslit	Phonetic correct translit (for Cyrillic)
Lingua::RU::Translit	Perl extension for decoding cyrillic translit/volapyuk
Lingua::Shakespeare::Character
Lingua::Sinica::PerlYuYan	Use Chinese to write Perl

Other usage, phonetics
Lingua::Alphabet::Phonetic	map ABC's to phonetic alphabets
Lingua::Alphabet::Phonetic::NATO	map ABC's to the NATO phonetic letter names
Lingua::FeatureMatrix	Perl extension for configuring groups of
Lingua::FeatureMatrix::Eme	Abstract base class contains one single
Lingua::FeatureMatrix::FeatureClass	A piece of
Lingua::FeatureMatrix::Implicature	Owns a single implicature within
Lingua::Phoneme	MySQL-based accent-lookups.
Lingua::Phonology	a module providing a unified way to deal with
Lingua::Phonology::Common
Lingua::Phonology::Features	a module to handle a set of hierarchical
Lingua::Phonology::Functions
Lingua::Phonology::RuleParser
Lingua::Phonology::Rules	a module for defining and applying
Lingua::Phonology::Segment	a module to represent a segment as a bundle
Lingua::Phonology::Segment::Boundary
Lingua::Phonology::Segment::Rules
Lingua::Phonology::Segment::Tier
Lingua::Phonology::Syllable
Lingua::Phonology::Symbols	a module for associating symbols with
Lingua::Phonology::Word

Humor & Nonsense
Acme::Lingua::NIGERIAN	WRITE PERL CODE IN NIGERIAN SPAM
Acme::Lingua::Pirate::Perl	be writin' thy Perl like a swarthy sea-dog
Acme::Lingua::Strine::Perl	make Perl more like Damian
Acme::Scurvy::Whoreson::BilgeRat	multi-lingual insult generator
Lingua::Atinlay::Igpay
Lingua::Bork	Perl extension for Bork Bork Bork (Assignment-The Enchefalizer)(muppets)
Lingua::En::Victory	Perl extension for egotistically expressing victory.
Lingua::Klingon::Collate	Sort words in Klingon sort order
Lingua::Klingon::Recode	Convert Klingon words between different encodings
Lingua::Klingon::Segment	Segment Klingon words into syllables and letters
Lingua::Rhyme	MySQL-based rhyme-lookups.
Lingua::Pangram	Is this string a pangram
Lingua::Rhyme	MySQL-based rhyme-lookups.
Lingua::Rhyme::FindScheme	find rhyme schemes in text.
Lingua::Romana::Perligata	Perl in Latin
Lingua::Shakespeare	Perl in a Shakespeare play
Lingua::Shakespeare::Character
Lingua::Shakespeare::Play

//
// searched 19 Oct 2004
// results from http://cpan.uwinnipeg.ca/search?query=Lingua%3A%3A&mode=module
// 200 found.
//

In Section Seekers of Perl Wisdom

Language level
Lingua::Ident	Statistical language identification
Lingua::Identify	Language identification
Lingua::Preferred	Pick a language based on user's preferences

Phrase/sentence/syntax level
Lingua::CollinsParser	Head-driven syntactic sentence parser
Lingua::CollinsParser::Node	Syntax tree node
Lingua::Conjunction	Convert lists into conjunctions
Lingua::EN::Sentence	Module for splitting text into sentences.
Lingua::EN::Splitter	Split text into words, paragraphs, segments, and tiles
Lingua::EN::Squeeze	Shorten english text for Pagers/GSM phones
Lingua::LinkParser	Link Grammar Parser by Sleator, Temperley and Lafferty at CMU
Lingua::LinkParser::Definitions	Extension providing text definitions for link types
Lingua::LinkParser::Dictionary
Lingua::LinkParser::Linkage
Lingua::LinkParser::Linkage::Sublinkage
Lingua::LinkParser::Linkage::Sublinkage::Link
Lingua::LinkParser::Linkage::Word
Lingua::LinkParser::MatchPath	Match paths in linkage diagrams
Lingua::LinkParser::MatchPath::BuildSM
Lingua::LinkParser::MatchPath::Lex
Lingua::LinkParser::MatchPath::Parser
Lingua::LinkParser::MatchPath::SM
Lingua::LinkParser::MatchPath::SMContext
Lingua::LinkParser::Sentence
Lingua::LinkParser::Simple	Perl extension for Link Parser - incomplete access to API
Lingua::EN::Segmenter	Subdivide texts into passages that represent subtopics
Lingua::EN::Segmenter::Baseline	Segment text randomly for baseline purposes
Lingua::EN::Segmenter::Evaluator	Evaluate a segmenting method
Lingua::EN::Segmenter::TextTiling	Segment text using the TextTiling method
Lingua::EN::Summarize::Filters	Helper functions for the Summarize module
Lingua::EN::Summarize	A simple tool for summarizing bodies of English text.
Lingua::EN::Summarize::Filters	Helper functions for the Summarize module
Lingua::EN::Tagger	Part-of-speech tagger for English natural language processing.

Word level
Lingua::DE::ASCII	Perl extension to convert german umlauts to and from ascii
Lingua::EN::StopWords	Typical stop words for an English corpus
Lingua::EN::AddressGrammar	grammar tree for Lingua::EN::AddressParse
Lingua::EN::AddressParse	Manipulate geographical addresses
Lingua::EN::Dict	BETA Version of XML english dictionary storage.
Lingua::EN::Fathom	Readability measurements for English text
Lingua::EN::FindNumber	Locate (written) numbers in English text
Lingua::EN::Gender	Inflect pronouns for gender
Lingua::EN::Hyphenate	Syllable based hyphenation
Lingua::EN::Infinitive	Find infinitive of a conjugated word
Lingua::EN::Inflect	English sing->plur, a/an, nums, participles
Lingua::EN::Inflect::Number	Force number of words to singular or plural
Lingua::EN::Keywords	Automatically extracts keywords from text
Lingua::EN::Tagger	Part-of-speech tagger for English natural language processing.
Lingua::EN::Syllable	Estimate syllable count in words
Lingua::EN::VerbTense
Lingua::Ispell	Interface to the Ispell spellchecker
Lingua::LA::Stemmer	Stemmer for Latin
Lingua::Lexicon::IDP	OOP methods for Internet Dictionary Project

Human names
Lingua::EN::MatchNames	Smart matching for human names
Lingua::EN::Nickname	Genealogical nickname matching(Peggy=Midge)
Lingua::EN::NameCase	Convert NAMES and names to Correct Case
Lingua::EN::Namegame	Converts name to verse as in Name Game song
Lingua::EN::NamedEntity	Basic Named Entity Extraction algorithm
Lingua::EN::NameGrammar	grammar tree for Lingua::EN::NameParse
Lingua::EN::NameLookup	a simple dictionary search and manipulation class.
Lingua::EN::NameParse	Manipulate persons name

Numbers h
Lingua::31337	P3RL M0DU1E 7O c0NVer7 7ext 7O C0o1 741k
Lingua::DE::Num2Word	positive number to text convertor for german. Output
Lingua::DE::Sentence	Perl extension for tokenizing german texts into their sentences.
Lingua::EN::Nums2Words
Lingua::EN::Numbers	Converts numeric values into their English string equivalents.
Lingua::EN::WordsToNumbers	convert numbers written in English to actual numbers
Lingua::EN::Numbers	Converts numeric values into their English string equivalents.
Lingua::EN::Numbers::Easy	Hash access to Lingua::EN::Numbers objects.
Lingua::EN::Numbers::Ordinate	go from cardinal (53) to ordinal (53rd)
Lingua::EN::Numericalize	Replaces English descriptions of numbers with numerals
Lingua::EN::Nums2Words
Lingua::EN::Words2Nums	convert English text to numbers
Lingua::EN::WordsToNumbers	convert numbers written in English to actual numbers
Lingua::FR::Nums2Words	Converts numbers to French words
Lingua::Num2Word	wrapper for number to text conversion modules of

Lingua::Alignment stuff	I think it does alignment of two texts in different languages
Lingua::Alignment
Lingua::AlignmentEval
Lingua::AlignmentSet	handle a word-aligned bilingual corpus
Lingua::AlignmentSlice

Lingua::Features stuff.	I think it is a framework for language description (completely 'meta'; no implementation)
Lingua::Features	Natural languages features
Lingua::Features::Feature	Feature object for Lingua::Features
Lingua::Features::FeatureType	FeatureType object for Lingua::Features
Lingua::Features::Library	Features library object for Lingua::Features
Lingua::Features::Structure	Structure object for Lingua::Features
Lingua::Features::StructureType	StructureType object for Lingua::Features
Lingua::Features::Tag	Tag object for Lingua::Features
Lingua::Features::Type	Type object for Lingua::Features
Lingua::Features::Value	Value object for Lingua::Features