I'm interested in getting ideas on how to go about writing a program to take two lists of words and try to match morphemes. One list would be in English, the other list would be in a langauge that is known at runtime. The bound morphemes would be predictable(plural, tense, aspect...) but the number of "roots" would not be known until the program has gone through the lists. For example:
baSimda,in my head
Would return something like:
-imda,in my -
(or instead of 'in my - ' it would return a description.) These observations may not be true for the language, but they are true for the data that we have. When rules contradict eachother the program might look at the data closer to see if the rule is more complex, or it might decide that since the occurance of the rule is once out of x times, it is an exception, or that since two rules occur 50% each, they are both acceptable. The word lists would generally be around 100-200 entries... I'll try to get a bigger sample to play with tomorrow. I read the article in tpj #17 and while it was interesting, I still don't know where to start...
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||