Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
There's more than one way to do things
 
PerlMonks  

Re^4: Database Search Format Engine

by creamygoodness (Curate)
on Jun 18, 2009 at 04:21 UTC ( #772631=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Database Search Format Engine
in thread Database Search Format Engine

The stable branch of KinoSearch (0.165) doesn't handle UTF-8 properly. You need the dev branch for that (0.20_01 and above). For Asian languages, you absolutely need UTF-8, or support for native encodings like Shift-JIS.

Tokenizing is also quite a challenge for Asian languages, particularly Japanese, and KinoSearch doesn't have a dedicated CJK tokenizer class or anything like that. It's on the todo list, but not very high -- I'm more concerned with making sure that the framework will allow others to write high-performance KSx extensions than with writing everything myself.


Comment on Re^4: Database Search Format Engine

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://772631]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2014-04-17 03:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (439 votes), past polls