Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Using the unicode61 tokenizer in DBD::SQLite

by elef (Friar)
on Jan 05, 2014 at 19:04 UTC ( #1069409=perlquestion: print w/replies, xml ) Need Help??
elef has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I'm trying to get the unicode61 tokenizer working in DBD::SQLite. The purpose is to get correct Unicode case folding, i.e. for SQLite to know that is the upper-case version of and treat them as such (return case-insensitive hits from an FTS4 table on MATCH queries).
According to
"The "unicode61" tokenizer is available beginning with SQLite version 3.7.13. Unicode61 works very much like "simple" except that it does full unicode case folding according to rules in Unicode Version 6.1 and it recognizes unicode space and punctuation characters and uses those to separate tokens. The simple tokenizer only does case folding of ASCII characters and only recognizes ASCII space and punctuation characters as token separators."
I just updated DBD::SQLite to 1.40 and made sure I have SQLite version 3.7.17.
Yet, when I try to run a CREATE VIRTUAL TABLE mytable USING fts4 (tokenize=unicode61) I get: "DBD::SQLite::db do failed: unknown tokenizer: unicode61".
Was SQLite compiled without enabling the unicode61 tokenizer? (Some sources mention compiling sqlite with SQLITE_ENABLE_FTS4_UNICODE61 in order to get this functionality.) Do I have any options here?

Replies are listed 'Best First'.
Re: Using the unicode61 tokenizer in DBD::SQLite
by Anonymous Monk on Jan 06, 2014 at 02:28 UTC

    Do I have any options here?

    I think recompile with   perl Makefile.PL DEFINE=-DSQLITE_ENABLE_FTS4_UNICODE61 and give it a shot

    update: I just recompiled, all tests passed except some in t/51_table_column_metadata.t , same with  cpanp -z ISHIGAKI/DBD-SQLite-1.41_03.tar.gz


    Seems to work

    $ perl -Mblib -S dbish dbi:SQLite:testgonernow DBI::Shell 11.95 using DBI 1.628 WARNING: The DBI::Shell interface and functionality are ======= very likely to change in subsequent versions! Connecting to 'dbi:SQLite:testgonernow' as ''... @dbi:SQLite:testgonernow> /table_info TABLE_CAT,TABLE_SCHEM,TABLE_NAME,TABLE_TYPE,REMARKS,sqlite_sql undef,'main','sqlite_master','SYSTEM TABLE',undef,undef undef,'temp','sqlite_temp_master','SYSTEM TABLE',undef,undef [2 rows of 6 fields returned] @dbi:SQLite:testgonernow> CREATE VIRTUAL TABLE mytable USING fts4 (tok +enize=unicode61); [0E0 rows affected] @dbi:SQLite:testgonernow> /table_info TABLE_CAT,TABLE_SCHEM,TABLE_NAME,TABLE_TYPE,REMARKS,sqlite_sql undef,'main','mytable_segdir','INDEX',undef,undef undef,'main','sqlite_master','SYSTEM TABLE',undef,undef undef,'temp','sqlite_temp_master','SYSTEM TABLE',undef,undef undef,'main','mytable','TABLE',undef,'CREATE VIRTUAL TABLE mytable USI +NG fts4 (tokenize=unicode61)' undef,'main','mytable_content','TABLE',undef,'CREATE TABLE \'mytable_c +ontent\'(docid INTEGER PRIMARY KEY, \'c0content\') ' undef,'main','mytable_docsize','TABLE',undef,'CREATE TABLE \'mytable_d +ocsize\'(docid INTEGER PRIMARY KEY, size BLOB)' undef,'main','mytable_segdir','TABLE',undef,'CREATE TABLE \'mytable_se +gdir\'(level INTEGER,idx INTEGER,start_block INTEG ER,leaves_end_block INTEGER,end_block INTEGER,root BLOB,PRIMARY KEY(le +vel, idx))' undef,'main','mytable_segments','TABLE',undef,'CREATE TABLE \'mytable_ +segments\'(blockid INTEGER PRIMARY KEY, block BLOB )' undef,'main','mytable_stat','TABLE',undef,'CREATE TABLE \'mytable_stat +\'(id INTEGER PRIMARY KEY, value BLOB)' [9 rows of 6 fields returned] @dbi:SQLite:testgonernow> /exit Disconnecting from dbi:SQLite:testgonernow. $ rm testgonernow
      Thanks, Anonymonk.
      I have no idea about any of the steps required for recompiling sqlite for DBD::SQLite. I don't even know where the executable or the source code is stored. Can you give me some pointers? I'm on Windows BTW.
        Um, I'm on windows too .... if you don't recognize cpanp -z... perl Makefile... you probably want to read A Guide To Installing Modules ... among others
Re: Using the unicode61 tokenizer in DBD::SQLite
by ww (Archbishop) on Jan 05, 2014 at 20:02 UTC

    Option 1 (a WAG): re-download and re-install SQLite because your error message conflicts with the tokenizer note you cite. Why: The conflict suggests the possibility that your sqlite install is borked.

    Option 2: Inquire on one of the SQLite fora/mailing lists/whatever, if you don't get a better answer from another Monk; one who uses SQLite at a more sophisticated level than I.

    Come, let us reason together: Spirit of the Monastery
      I really don't think there's anything wrong with my sqlite install. I can create dbs, import and query data just fine. The unicode61 tokenizer is the only thing that's failing. I suspect that whoever compiled sqlite for DBD::SQLite did so without enabling the unicode tokenizer, which would be a major pain for me. I'm hoping that I'm just doing it wrong or the feature needs to be enabled somehow in my code or whatever.
      I guess I will email the maintainer of DBD::SQLite if the fellow monks don't have any insight. We'll see.
Re: Using the unicode61 tokenizer in DBD::SQLite
by elef (Friar) on Jan 14, 2015 at 20:01 UTC
    Note for posterity: I updated DBD::SQLite to 1.46 and the problem went away. The unicode tokenizer works now.
    It might be due to a change in SQLite itself; the changelog at says: The unicode61 tokenizer is now included in FTS4 by default. (as of 3.8.6)
    The previous DBD::SQLite version I used included v3.7.x of SQLite, the update got me up to

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1069409]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2017-09-22 21:45 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (269 votes). Check out past polls.