http://www.perlmonks.org?node_id=491787

nothingmuch has asked for the wisdom of the Perl Monks concerning the following question:

Hola...

DBD::SQLite (with version 3 of sqlite) can store UTF8 data, but when it's extracted perl doesn't recognize it as UTF8 but as a string of bytes.

For example, if I say (CDBI layer assumed)

$foo->name("????"); # appearantly perlmonks isn't using [gaal]'s patch + ;-) # then later $foo->name =~ /\p{Hebrew}/;
then the answer is false.

Update: trying to make it clear that the actual conversion is not the problem, it's at what level the conversion will be done

This is kind of trivial to work around, all I need one of a million snippets like:

$string = Encode::decode_utf($string); # or utf8::decode($string) if !utf8::is_utf8($string) and utf8::valid($stri +ng); ...
for every string coming from the database. The problem is that I don't know how to do it except with a rather crude method - using Class::DBI's inflation mechanism to do this work.

Ideally I would like to tell DBD::SQLite that all the strings in the DB are UTF8 and should be interpreted as such. gaal made a patch against DBD::mysql to support similar functionality, but i'm not using mysql.

-nuffin
zz zZ Z Z #!perl

Replies are listed 'Best First'.
Re: UTF8 vs SQLite
by Corion (Patriarch) on Sep 14, 2005 at 08:19 UTC

    From how I interpret the comments regarding the SQLite changes between 2.8 and 3.x, SQLite doesn't care what kind of data you hand it. I'm not sure if DBI provides some hooks or ways to automatically tag all (outgoing) strings as UTF-8 though - if you're using Class::DBI, you can put the conversion/tagging into a trigger, but I guess if DBI doesn't provide, you have to use Scalar::Util yourself to add the UTF-8 tag. Or maybe barts tagging trick via pack or unpack works:

    my $utf8 = unpack "C0", $value; # but bart will likely correct my incorrect usage
Re: UTF8 vs SQLite
by tphyahoo (Vicar) on Sep 14, 2005 at 09:15 UTC
    I'm struggling through some utf-8 issues myself, and the best explanation I found (since I found the perl docu overwhelming) was at Unicode-processing issues in Perl and how to cope with it

    The general strategy this page suggests for dealing with unicode in perl 5.8 is

    $_ = Encode::decode_utf8( $_ );
    This was for a cgi context, but it seems to me it should be applicable to all input from wherever. I suspect, but am not certain, this is doing the same thing, or something similar, to what corion suggested.
      The actual conversion is not what I'm trying to get done. What I want is for the conversion to be done automatically at a level much lower than I care to fiddle with.

      Corion's Class::DBI trigger idea is probably the best, but I'd like to be able to do it at the DBI level if possible, since the DBH should really be losslessly relaying data I send to and from it and Class::DBI should not be touch that.

      -nuffin
      zz zZ Z Z #!perl
Re: UTF8 vs SQLite
by bsb (Priest) on Sep 14, 2005 at 09:54 UTC
    You could also do the conversion at the Class::Accessor level by overriding "get" or the individual accessors to return utf8.

    Cees gave me some good tips during a similar discussion at Date conversion with Class::DBI, although it was regarding date formatting rather than encoding.

Re: UTF8 vs SQLite
by nothingmuch (Priest) on Sep 24, 2005 at 22:31 UTC
    Eventually I chose this solution:
    use Data::Structure::Util qw/_utf8_on/; $some_cdbi_class->add_trigger(select => sub { _utf8_on($_[0]) });
    -nuffin
    zz zZ Z Z #!perl

      Late in the game, I know.

      This is working for me too and it may be the solution I keep till the DBD::mysql is fixed. I think the recursive sub is probably not needed, right? Just the regular utf8_on instead of the _utf8_on in your sample.

        IIRC (i haven't played with this code in a while, don't have convenient access to it in the weekend) I did the recursive sub because it was already an object in the trigger.

        WRT mysql - gaal submitted a patch - use it. I heard it won't be applied because the authors would like more complete support, but it's not likely that will happen.

        -nuffin
        zz zZ Z Z #!perl
Re: UTF8 vs SQLite
by qq (Hermit) on Jan 15, 2006 at 17:31 UTC

    I've only skimmed this thread, but I didn't see this mentioned. There has been some changes to DBD::SQLite at version 1.10. From the current pod:

    unicode

    If set to a true value, DBD::SQLite will turn the UTF-8 flag on for all text strings coming out of the database. For more details on the UTF-8 flag see perlunicode. The default is for the UTF-8 flag to be turned off.

    The Changes file says:

    1.10 - Fix Unicode support (DOMQ)

    So with luck all you need is an upgrade