Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^4: Database Search Format Engine

by creamygoodness (Curate)
on Jun 18, 2009 at 04:21 UTC ( #772631=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Database Search Format Engine
in thread Database Search Format Engine

The stable branch of KinoSearch (0.165) doesn't handle UTF-8 properly. You need the dev branch for that (0.20_01 and above). For Asian languages, you absolutely need UTF-8, or support for native encodings like Shift-JIS.

Tokenizing is also quite a challenge for Asian languages, particularly Japanese, and KinoSearch doesn't have a dedicated CJK tokenizer class or anything like that. It's on the todo list, but not very high -- I'm more concerned with making sure that the framework will allow others to write high-performance KSx extensions than with writing everything myself.


Comment on Re^4: Database Search Format Engine

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://772631]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (16)
As of 2015-07-30 13:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (271 votes), past polls