in reply to NLP - natural language regex-collections?
I looked at all Lingua module docs to find the ones that can be useful in the context of this thread: parsing or generating (english) language constructs.
I have excluded all modules for other languages than english, french, or german.
One module stands out from the others: Lingua::LinkParser is a wrapper for
the LINK parser (downloadable here code included), which is a parser written in C, and has an API, which
is used by the perl module. I haven't yet used the wrapper but did install the
parser itself, and compiled it without problem on win2k with vc6. It has a shell
which is easy to get started, and parsing seems very advanced (first impression).
This is a work in progress; I'll continue adding to it, as these and
other modules are examined. (Regexp::, Parser::, etc. will follow)
Language level | ||
Lingua::Ident | Statistical language identification | |
Lingua::Identify | Language identification | |
Lingua::Preferred | Pick a language based on user's preferences | |
Phrase/sentence/syntax level | ||
Lingua::CollinsParser | Head-driven syntactic sentence parser | |
Lingua::CollinsParser::Node | Syntax tree node | |
Lingua::Conjunction | Convert lists into conjunctions | |
Lingua::EN::Sentence | Module for splitting text into sentences. | |
Lingua::EN::Splitter | Split text into words, paragraphs, segments, and tiles | |
Lingua::EN::Squeeze | Shorten english text for Pagers/GSM phones | |
Lingua::LinkParser | Link Grammar Parser by Sleator, Temperley and Lafferty at CMU | |
Lingua::LinkParser::Definitions | Extension providing text definitions for link types | |
Lingua::LinkParser::Dictionary | ||
Lingua::LinkParser::Linkage | ||
Lingua::LinkParser::Linkage::Sublinkage | ||
Lingua::LinkParser::Linkage::Sublinkage::Link | ||
Lingua::LinkParser::Linkage::Word | ||
Lingua::LinkParser::MatchPath | Match paths in linkage diagrams | |
Lingua::LinkParser::MatchPath::BuildSM | ||
Lingua::LinkParser::MatchPath::Lex | ||
Lingua::LinkParser::MatchPath::Parser | ||
Lingua::LinkParser::MatchPath::SM | ||
Lingua::LinkParser::MatchPath::SMContext | ||
Lingua::LinkParser::Sentence | ||
Lingua::LinkParser::Simple | Perl extension for Link Parser - incomplete access to API | |
Lingua::EN::Segmenter | Subdivide texts into passages that represent subtopics | |
Lingua::EN::Segmenter::Baseline | Segment text randomly for baseline purposes | |
Lingua::EN::Segmenter::Evaluator | Evaluate a segmenting method | |
Lingua::EN::Segmenter::TextTiling | Segment text using the TextTiling method | |
Lingua::EN::Summarize::Filters | Helper functions for the Summarize module | |
Lingua::EN::Summarize | A simple tool for summarizing bodies of English text. | |
Lingua::EN::Summarize::Filters | Helper functions for the Summarize module | |
Lingua::EN::Tagger | Part-of-speech tagger for English natural language processing. | |
Word level | ||
Lingua::DE::ASCII | Perl extension to convert german umlauts to and from ascii | |
Lingua::EN::StopWords | Typical stop words for an English corpus | |
Lingua::EN::AddressGrammar | grammar tree for Lingua::EN::AddressParse | |
Lingua::EN::AddressParse | Manipulate geographical addresses | |
Lingua::EN::Dict | BETA Version of XML english dictionary storage. | |
Lingua::EN::Fathom | Readability measurements for English text | |
Lingua::EN::FindNumber | Locate (written) numbers in English text | |
Lingua::EN::Gender | Inflect pronouns for gender | |
Lingua::EN::Hyphenate | Syllable based hyphenation | |
Lingua::EN::Infinitive | Find infinitive of a conjugated word | |
Lingua::EN::Inflect | English sing->plur, a/an, nums, participles | |
Lingua::EN::Inflect::Number | Force number of words to singular or plural | |
Lingua::EN::Keywords | Automatically extracts keywords from text | |
Lingua::EN::Tagger | Part-of-speech tagger for English natural language processing. | |
Lingua::EN::Syllable | Estimate syllable count in words | |
Lingua::EN::VerbTense | ||
Lingua::Ispell | Interface to the Ispell spellchecker | |
Lingua::LA::Stemmer | Stemmer for Latin | |
Lingua::Lexicon::IDP | OOP methods for Internet Dictionary Project | |
Human names | ||
Lingua::EN::MatchNames | Smart matching for human names | |
Lingua::EN::Nickname | Genealogical nickname matching(Peggy=Midge) | |
Lingua::EN::NameCase | Convert NAMES and names to Correct Case | |
Lingua::EN::Namegame | Converts name to verse as in Name Game song | |
Lingua::EN::NamedEntity | Basic Named Entity Extraction algorithm | |
Lingua::EN::NameGrammar | grammar tree for Lingua::EN::NameParse | |
Lingua::EN::NameLookup | a simple dictionary search and manipulation class. | |
Lingua::EN::NameParse | Manipulate persons name | |
Numbers h | ||
Lingua::31337 | P3RL M0DU1E 7O c0NVer7 7ext 7O C0o1 741k | |
Lingua::DE::Num2Word | positive number to text convertor for german. Output | |
Lingua::DE::Sentence | Perl extension for tokenizing german texts into their sentences. | |
Lingua::EN::Nums2Words | ||
Lingua::EN::Numbers | Converts numeric values into their English string equivalents. | |
Lingua::EN::WordsToNumbers | convert numbers written in English to actual numbers | |
Lingua::EN::Numbers | Converts numeric values into their English string equivalents. | |
Lingua::EN::Numbers::Easy | Hash access to Lingua::EN::Numbers objects. | |
Lingua::EN::Numbers::Ordinate | go from cardinal (53) to ordinal (53rd) | |
Lingua::EN::Numericalize | Replaces English descriptions of numbers with numerals | |
Lingua::EN::Nums2Words | ||
Lingua::EN::Words2Nums | convert English text to numbers | |
Lingua::EN::WordsToNumbers | convert numbers written in English to actual numbers | |
Lingua::FR::Nums2Words | Converts numbers to French words | |
Lingua::Num2Word | wrapper for number to text conversion modules of | |
Lingua::Alignment stuff | I think it does alignment of two texts in different languages | |
Lingua::Alignment | ||
Lingua::AlignmentEval | ||
Lingua::AlignmentSet | handle a word-aligned bilingual corpus | |
Lingua::AlignmentSlice | ||
Lingua::Features stuff. | I think it is a framework for language description (completely 'meta'; no implementation) | |
Lingua::Features | Natural languages features | |
Lingua::Features::Feature | Feature object for Lingua::Features | |
Lingua::Features::FeatureType | FeatureType object for Lingua::Features | |
Lingua::Features::Library | Features library object for Lingua::Features | |
Lingua::Features::Structure | Structure object for Lingua::Features | |
Lingua::Features::StructureType | StructureType object for Lingua::Features | |
Lingua::Features::Tag | Tag object for Lingua::Features | |
Lingua::Features::Type | Type object for Lingua::Features | |
Lingua::Features::Value | Value object for Lingua::Features | |
Other stuff (not useful for above-mentioned purpose):
Other languages (non UK, FR, DE) | *not* English, French, German |
Lingua::AF::Numbers | Perl module for converting numeric values into their Afrikaans equivalents |
Lingua::AM::Abbreviate | |
Lingua::AR::MacArabic | transcode between Mac OS Arabic encoding and Unicode |
Lingua::CS::Num2Word | number to text convertor for czech. Output |
Lingua::AR::MacArabic | transcode between Mac OS Arabic encoding and Unicode |
Lingua::DetectCyrillic | |
Lingua::EO::Supersignoj | Convert Esperanto characters |
Lingua::ES::Silabas | Divide una palabra en sE<iacute>labas |
Lingua::ES::Numeros | Convierte números a texto en Español (Castellano) |
Lingua::EU::Numbers | Converts numbers into Bask (Euskara). |
Lingua::FA::MacFarsi | transcode between Mac OS Farsi encoding and Unicode |
Lingua::FA::Number | Converts English numbers to their Persian (Farsi) HTML/Unicode equivalent |
Lingua::FI::Genitive | Finnish genitive |
Lingua::FI::Hyphenate | Finnish hyphenation (suomen tavutus) |
Lingua::FI::Inflect | Finnish inflect |
Lingua::FI::Kontti | Finnish Pig Latin (kontinkieli) |
Lingua::FI::Transcribe | Finnish transcription |
Lingua::ID::Nums2Words | convert number to Indonesian verbage. |
Lingua::ID::Words2Nums | convert Indonesian verbage to number. |
Lingua::IT::Conjugate | Conjugation of Italian verbs |
Lingua::IT::Hyphenate | Italian word hyphenation |
Lingua::IT::Numbers | Converts numeric values into their Italian string equivalents |
Lingua::IW::Logical | module for working with logical and visual hebrew | </tr>
Lingua::JA::Fold | fold a Japanese text. |
Lingua::JA::Jcode | |
Lingua::JA::Jtruncate | module to truncate Japanese encoded text. |
Lingua::JA::MacJapanese | transcoding between Mac OS Japanese and Unicode |
Lingua::JA::Mail | compose mail with Japanese charset |
Lingua::JA::Mail::Header | build ISO-2022-JP charset 'B' encoding mail header fields |
Lingua::JA::Number | Translate numbers into Japanese |
Lingua::JA::Regular | Regularize of the Japanese character. |
Lingua::JA::Regular::Table | Conversion Table for Lingua::JA::Regular |
Lingua::JA::Regular::Table::Kanji | Conversion Table(Kanji) for Lingua::JA::Regular |
Lingua::JA::Regular::Table::Macintosh | Conversion Table(Macintosh Character) for Lingua::JA::Regular |
Lingua::JA::Regular::Table::Windows | Conversion Table(Windows Character) for Lingua::JA::Regular |
Lingua::JA::Romaji | Perl extension for romaji and kana conversion |
Lingua::JA::Sort::JIS | compare and sort Japanese character strings |
Lingua::JA::Sort::ReadableKey | Sorting and Romanizing Japanese |
Lingua::JP::Kanjidic | Parse Jim Breen's kanji dictionary |
Lingua::GA::Gramadoir | Check the grammar of Irish language text |
Lingua::GA::Gramadoir::Languages | |
Lingua::GA::Gramadoir::Languages::af | |
Lingua::GA::Gramadoir::Languages::de | |
Lingua::GA::Gramadoir::Languages::en_us | |
Lingua::GA::Gramadoir::Languages::fr | |
Lingua::GA::Gramadoir::Languages::ga | |
Lingua::GA::Gramadoir::Languages::mn | |
Lingua::GA::Gramadoir::Languages::nl | |
Lingua::GA::Gramadoir::Languages::ro | |
Lingua::GA::Gramadoir::Languages::sk | |
Lingua::GL::Stemmer | Galician language stemming |
Lingua::HE::MacHebrew | transcode between Mac OS Hebrew encoding and Unicode |
Lingua::HE::Sentence | Module for splitting Hebrew text into sentences. |
Lingua::ID::Words2Nums | convert Indonesian verbage to number. |
Lingua::NL::Numbers | Perl module for converting numeric values into their Dutch equivalents |
Lingua::NO::Num2Word | convert whole number to norwegian text. Output text is in ISO-8859-1 encoding. |
Lingua::KO::Hangul::Util | utility functions for Hangul in Unicode |
Lingua::KO::MacKorean | transcoding between Mac OS Korean and Unicode |
Lingua::PL::Numbers | Perl module for converting numeric values into their Polish equivalents |
Lingua::PT::Abbrev | An abbreviations dictionary manager for NLP |
Lingua::PT::Conjugate | |
Lingua::PT::Hyphenate | Separates Portuguese words in syllables |
Lingua::PT::Infinitives | |
Lingua::PT::Inflect | Portuguese words from singular to plural |
Lingua::PT::Nums2Ords | Converts numbers to Portuguese ordinals |
Lingua::PT::Nums2Words | Converts numbers to Portuguese words |
Lingua::PT::Ords2Nums | Converts Portuguese ordinals to numbers |
Lingua::PT::PLN | Perl extension for NLP of the Portuguese Language |
Lingua::PT::PLN::tokenizer | |
Lingua::PT::PLNbase | Perl extension for NLP of the Portuguese |
Lingua::PT::ProperNames | Simple module to extract proper names from Portuguese Text |
Lingua::PT::Stemmer | Portuguese language stemming |
Lingua::PT::UnConjugate | Recognition of the conjugated forms of |
Lingua::PT::VerbSuffixes | |
Lingua::PT::Words2Nums | Converts Portuguese words to numbers |
Lingua::RU::Antimat | Removes foul language from a Russian string |
Lingua::RU::Charset | Detect/Convert russian character sets. |
Lingua::RU::NameParse | Normalize Russian names |
Lingua::RU::Number | Converts numbers to money sum in words (in Russian roubles) |
Lingua::RU::PhTranslit | Phonetic correct translit (for Cyrillic) |
Lingua::RU::Translit | Perl extension for decoding cyrillic translit/volapyuk |
Lingua::Shakespeare::Character | |
Lingua::Sinica::PerlYuYan | Use Chinese to write Perl |
Other usage, phonetics | |
Lingua::Alphabet::Phonetic | map ABC's to phonetic alphabets |
Lingua::Alphabet::Phonetic::NATO | map ABC's to the NATO phonetic letter names |
Lingua::FeatureMatrix | Perl extension for configuring groups of |
Lingua::FeatureMatrix::Eme | Abstract base class contains one single |
Lingua::FeatureMatrix::FeatureClass | A piece of |
Lingua::FeatureMatrix::Implicature | Owns a single implicature within |
Lingua::Phoneme | MySQL-based accent-lookups. |
Lingua::Phonology | a module providing a unified way to deal with |
Lingua::Phonology::Common | |
Lingua::Phonology::Features | a module to handle a set of hierarchical |
Lingua::Phonology::Functions | |
Lingua::Phonology::RuleParser | |
Lingua::Phonology::Rules | a module for defining and applying |
Lingua::Phonology::Segment | a module to represent a segment as a bundle |
Lingua::Phonology::Segment::Boundary | |
Lingua::Phonology::Segment::Rules | |
Lingua::Phonology::Segment::Tier | |
Lingua::Phonology::Syllable | |
Lingua::Phonology::Symbols | a module for associating symbols with |
Lingua::Phonology::Word | |
Humor & Nonsense | |
Acme::Lingua::NIGERIAN | WRITE PERL CODE IN NIGERIAN SPAM |
Acme::Lingua::Pirate::Perl | be writin' thy Perl like a swarthy sea-dog |
Acme::Lingua::Strine::Perl | make Perl more like Damian |
Acme::Scurvy::Whoreson::BilgeRat | multi-lingual insult generator |
Lingua::Atinlay::Igpay | |
Lingua::Bork | Perl extension for Bork Bork Bork (Assignment-The Enchefalizer)(muppets) |
Lingua::En::Victory | Perl extension for egotistically expressing victory. |
Lingua::Klingon::Collate | Sort words in Klingon sort order |
Lingua::Klingon::Recode | Convert Klingon words between different encodings |
Lingua::Klingon::Segment | Segment Klingon words into syllables and letters |
Lingua::Rhyme | MySQL-based rhyme-lookups. |
Lingua::Pangram | Is this string a pangram |
Lingua::Rhyme | MySQL-based rhyme-lookups. |
Lingua::Rhyme::FindScheme | find rhyme schemes in text. |
Lingua::Romana::Perligata | Perl in Latin |
Lingua::Shakespeare | Perl in a Shakespeare play |
Lingua::Shakespeare::Character | |
Lingua::Shakespeare::Play | |
// | |
// searched 19 Oct 2004 | |
// results from http://cpan.uwinnipeg.ca/search?query=Lingua%3A%3A&mode=module | |
// 200 found. | |
// |