On of my pet projects is a word segmenter, meaning a
programm which gets sentences and spits out words. Easy to
get good results for the English language, but pretty hard
for languages like Japanese were you won't find any spaces
between words. A pretty though task were academic and commercial
research aren't that advanced.
I have to say for this task Perl is the perfect language. I
don't care on speed but only on results. If I get satisfied
by my project I may implement it in C, or may not. Call it
prototyping, call it research, call it whatever, I won't
try it with Prolog or Lisp.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
Outside of code tags, you may need to use entities for some characters:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||