Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: What are the monks doing with Perl and Linguistics?

by Willard B. Trophy (Hermit)
on Oct 07, 2003 at 20:40 UTC ( #297389=note: print w/ replies, xml ) Need Help??


in reply to What are the monks doing with Perl and Linguistics?

Collins Dictionaries were doing a lot of corpus linguistics using Perl when I left, back in 2002. They look after the Collins/Birmingham University Bank of English, which is a great big huge corpus. There are also a variety of monitor corpora, which are used to gauge changes in usage over time.

Corpus data collection got a whole lot easier with the web ... ☺ -- Sitescooper is particularly handy for large-scale text collection (with permission, of course).

--
bowling trophy thieves, die!


Comment on Re: What are the monks doing with Perl and Linguistics?
Re: Re: What are the monks doing with Perl and Linguistics?
by Anonymous Monk on Nov 11, 2003 at 23:22 UTC
    I'm currently researching cross-lingual digital libraries and I use Perl, although I am fairly new to the language. I have just finished writing a light stemmer, some ngram code, some ngram comparaison code, and basically i'm at that 'generating stats' stage. I'm looking for similarities between documents, differences in them too, and then look at language and context, and so on. The idea is to make documents searchable in many different langauges. I did a masters where I used Java, and made a system that could retrieve a similar english document in french and german..it kinda worked ;) I'm always interested in hearing what other are up to in that area, maybe we can swap some tools and share some ideas!! Ceejay

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://297389]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2014-08-02 04:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (54 votes), past polls