Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re: Perl and Linguistics

by graff (Chancellor)
on May 26, 2002 at 03:28 UTC ( #169330=note: print w/replies, xml ) Need Help??

in reply to Perl and Linguistics

... extending to more general linguistic modelling, ideally from a non-language-specific basis that can be adapted to different languages.

That's ambitious... but worth pursuing. The first thing that comes to my mind is (Hidden) Markov modelling, which has been demonstrated to do a decent job of drawing plausible "morphological" boundaries in a stream of text data in any given language. It appears that there are Markov modules on CPAN, but whether these are suitable to the task of language analysis is more than I know at present.

I do know that Perl is quite useful for handling a lot of "infrastructure" work relating to the management and handling of language data; e.g. developing and searching a lexicon, locating and displaying/highlighting tokens in a text stream, mapping across character encodings, etc. Of course, a lot of useful tools have already been developed (some in Perl, some in C(++)) -- check the archives at (and/or join) the CORPORA mailing list:

I'm sorry I can't give you any more detailed pointers or advice, but I hope this helps a little.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://169330]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2021-06-17 23:17 GMT
Find Nodes?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)

    Results (86 votes). Check out past polls.