http://www.perlmonks.org?node_id=863809


in reply to SQLite and Storable: is "Text" okay?

If you're using SQLite 3, the TEXT type stores data UTF encoded, while the BLOB type stores it raw. BLOB is more geared toward binary data storage.

If you're really concerned about problems due to UTF encoding, use a BLOB. It looks like Storable deals with UTF8 fine, but I don't know how much it likes UTF16. Since Storable generates a string, I would probably store it as a text column.

If you're using SQLite 2, all the datatypes are really strings. You could declare the column as an INTEGER and it wouldn't care. I would still pick TEXT, because it describes the data the best.

Edit: I misread the Storable docs. See below for details.

  • Comment on Re: SQLite and Storable: is "Text" okay?

Replies are listed 'Best First'.
Re^2: SQLite and Storable: is "Text" okay?
by ikegami (Patriarch) on Oct 06, 2010 at 15:53 UTC

    If you're using SQLite 3, the TEXT type stores data UTF encoded

    I don't see SQLite performing any encoding conversion, so that's surely false. It stores whatever bytes it received, whether it's encoded using UTF-8 or not.

    The difference will surface when someone tries to perform text operations (e.g. sorting) on the field.

    BLOB is definitely more appropriate here, seeing as the data is binary, but it's my understanding that the type of a column doesn't matter in SQLite if all you do with the column is store and fetch.

    If it does do encoding conversions, you need to stay away from TEXT for Storable data. You'll potentially waste space and time.

    I would still pick TEXT, because it describes the data the best.

    Storable returns binary data. BLOB should be picked because it describes the data best.

      The way that I read the Storable doc, it seemed to indicate that it stored data in UTF8. Looking at it again, it only indicates that strings are stored that way. In light of that fact, BLOB would definitely be the right choice.

      It looks like SQLite has started to actually make different data types matter recently. I started using it in version 2, when it said "everything is stored as string anyway". The version 3 docs don't make any such claims, but the main differences do appear to be with non store-fetch operations.

        The way I read it, SQLite values are like Perl scalars. Stored in a convenient format, and coerced appropriately when required (by comparisons). The storage format is based on the value itself, whereas the coercion is based on the column type.

        it only indicates that strings are stored that way.

        Whatever it indicates, it's doesn't do that. It stores the string as it appears in the scalar.

        $ perl -MStorable=freeze -e' utf8::upgrade( $_="abc\x{80}def" ); print freeze(\$_); ' | od -c 0000000 004 \a 004 1 2 3 4 004 004 004 \b 027 \b a b +c 0000020 302 200 d e f 0000025 $ perl -MStorable=freeze -e' utf8::downgrade( $_="abc\x{80}def" ); print freeze(\$_); ' | od -c 0000000 004 \a 004 1 2 3 4 004 004 004 \b \n \a a b +c 0000020 200 d e f 0000024