moritz,
This is not an issue of indexing. In fact, this should be compatible with most any search indexing system. MySQL supports the Asian languages well enough to satisfy me. The difficulty here is more of a Perl problem.
The issue is that of reformating the search from a few search words into a Mysql "SELECT * FROM MyTable WHERE ..." type query.
The core of the Perl issue seems to revolve around word-boundary issues. The Asian languages run all words together, so that a sentence appears as if it were one word (i.e. no white space to delimit words). The \w, \b, \d, etc. are supposed to be compatible with any language, but in actual practice, they have shortcomings when dealing with the double-byte character word boundaries. I have had to replace \w in my code for \p{...} type expressions.
Kino search lists its language compatibilities under "Features" as:
* Full support for 12 Indo-European languages.
My first efforts at making this program work on Chinese also failed miserably. I was disappointed that the Perl regex would not work as it was supposed to according to the documentation I had found. (I had used \w in the beginning.)
So, for KinoSearch to have the same flaw would not surprise me at all. Most programmers do not purposely avoid the common regex tokens just so that they can be certain their code will be compatible with any language.
Who knows...maybe I'm not reinventing this wheel after all?
Blessings,
~ Polyglot ~ |