Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^4: Database Search Format Engine

by creamygoodness (Curate)
on Jun 18, 2009 at 04:21 UTC ( #772631=note: print w/replies, xml ) Need Help??


in reply to Re^3: Database Search Format Engine
in thread Database Search Format Engine

The stable branch of KinoSearch (0.165) doesn't handle UTF-8 properly. You need the dev branch for that (0.20_01 and above). For Asian languages, you absolutely need UTF-8, or support for native encodings like Shift-JIS.

Tokenizing is also quite a challenge for Asian languages, particularly Japanese, and KinoSearch doesn't have a dedicated CJK tokenizer class or anything like that. It's on the todo list, but not very high -- I'm more concerned with making sure that the framework will allow others to write high-performance KSx extensions than with writing everything myself.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://772631]
help
Chatterbox?
[Corion]: Whoa! I haven't invoked Crunchy today! But thanks to Crunchy, it's Friday! ;)

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2017-01-20 07:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you watch meteor showers?




    Results (173 votes). Check out past polls.