Aaah my lovely favourite subfield of interest...

first off, you can diferenciate between knowledge based stemming algorithms and probabilistic stemming. And of course there is a bunch of heuristic mixture of these two aproaches spread all over the literature and the web. If you want something "not so good, but good enough and not expensive", you could use the next generation of old stemmer. See Snowball. Snowball is quite ok, especially because there are descriptions for more languages. However you never will be able to gain 100% accuracy with this approach, as only a dictionary of a given lang together with morphology knowledge will give you best (but still ambiguous) results.

But this requires heavy duty hardware, where heavy duty software can run on...


In reply to Re: Natural Language Index Stemming by PetaMem
in thread Natural Language Index Stemming by rob_au

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":