Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Re: Re: Perl and Linguistics

by mattr (Curate)
on May 28, 2002 at 09:10 UTC ( #169722=note: print w/ replies, xml ) Need Help??


in reply to Re: Re: Perl and Linguistics
in thread Perl and Linguistics

Sorry I do use kakasi as main tool in search engine now. But I use chasen sometimes for individual documents since I am under the impression that it is slower, more flexible, more sophisticated. I just mentioned Chasen because I remembered Nara and clustering, and that gave me chasen.

For those who are not familiar with either tool, they are morphological analyzers of Japanese text. They are similar, though and generally are used to split a chunk of text into individual words (Japanese words are not usually separated by spaces) and to get the phonetic reading of those words (usually in roman alphabet).

Obviously this is enabling technology. The name of Kakasi in fact is a kind of palindrome, in that read backwards phonetically you get the name of a popular front end processor which will take roman alphabet input and interactively pick the correct characters based on that phonetic reading and the context.

I believe Kakasi is focussed more on workaday speed and useability while chasen might be more flexible. In particular there is some interesting use of chasen in document clustering work done in Nara and elsewhere I seem to remember. Couldn't find the exact page but google will help you look at the field. Personally where I use these tools is in custom search engines I build, usually either completely in Perl or with plugins from projects like the above. They are mainly useful it seems in building an inverted index to search a lot of text quickly but I have a small (a few megabytes) Japanese database that works fine just with (Japanese) regexes.

I think it would be very interesting if Perl programmers could easily use state of the art computational linguistics or "A.I." algorithms (besides I guess what are already in perl) to make perl even more intelligent and perhaps automate some of the programming task. For example someone just gave me three nasty scripts to refactor together and update for 5.6.1, maybe perl could learn to tell me "Yep, those are real nasty scripts, better rewrite from scratch," or perhaps give me other insights into the code.

I am no a computational linguist, just interested. There is an awful lot of science there, so if anybody has insights about it please share with the rest of us.


Comment on Re: Re: Re: Perl and Linguistics

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://169722]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (8)
As of 2014-08-30 17:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (293 votes), past polls