Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

SQLite and Storable: is "Text" okay?

by sundialsvc4 (Abbot)
on Oct 06, 2010 at 14:56 UTC ( #863802=perlquestion: print w/replies, xml ) Need Help??
sundialsvc4 has asked for the wisdom of the Perl Monks concerning the following question:

I am using Storable to put information into an SQLite database.   As it happens, I am freezeing the structure into a string, then inserting that into a Text-type column.   (Of course, I am binding to a query parameter, thus allowing me to insert the arbitrary string.)   It all seems to be working fine.

But then, a colleague’s question got me to wondering ... should I be using the Blob field-type for this?   I don’t think so, but, “when in doubt, humbly petiton the Monks ...”

This arrangement is perfectly satisfactory for my needs... I’m not transporting the data anywhere, don’t have to keep it for a long time, don’t need the “freeze to a file” features of Storable, and don’t really need to “get jiggy wid it” (if you know what I mean).   I simply need to store a Perl data-structure that took some time to create, so that I don’t have to create it a second time.   But if there might be some column-data-type related conversion issue that I am not aware of, I’d just like to know about it sooner rather than later.   (The target file-format will always be SQLite.)

Replies are listed 'Best First'.
Re: SQLite and Storable: is "Text" okay?
by assemble (Friar) on Oct 06, 2010 at 15:12 UTC

    If you're using SQLite 3, the TEXT type stores data UTF encoded, while the BLOB type stores it raw. BLOB is more geared toward binary data storage.

    If you're really concerned about problems due to UTF encoding, use a BLOB. It looks like Storable deals with UTF8 fine, but I don't know how much it likes UTF16. Since Storable generates a string, I would probably store it as a text column.

    If you're using SQLite 2, all the datatypes are really strings. You could declare the column as an INTEGER and it wouldn't care. I would still pick TEXT, because it describes the data the best.

    Edit: I misread the Storable docs. See below for details.

      If you're using SQLite 3, the TEXT type stores data UTF encoded

      I don't see SQLite performing any encoding conversion, so that's surely false. It stores whatever bytes it received, whether it's encoded using UTF-8 or not.

      The difference will surface when someone tries to perform text operations (e.g. sorting) on the field.

      BLOB is definitely more appropriate here, seeing as the data is binary, but it's my understanding that the type of a column doesn't matter in SQLite if all you do with the column is store and fetch.

      If it does do encoding conversions, you need to stay away from TEXT for Storable data. You'll potentially waste space and time.

      I would still pick TEXT, because it describes the data the best.

      Storable returns binary data. BLOB should be picked because it describes the data best.

        The way that I read the Storable doc, it seemed to indicate that it stored data in UTF8. Looking at it again, it only indicates that strings are stored that way. In light of that fact, BLOB would definitely be the right choice.

        It looks like SQLite has started to actually make different data types matter recently. I started using it in version 2, when it said "everything is stored as string anyway". The version 3 docs don't make any such claims, but the main differences do appear to be with non store-fetch operations.

Re: SQLite and Storable: is "Text" okay?
by sundialsvc4 (Abbot) on Oct 06, 2010 at 15:23 UTC

    It is SQLite 3.6.1.   As long as the UTF-encoding is compatible with what Storable is doing, and “in all cases,” then everything is just fine as it is right now.   I just wasn’t 100%-sure what storage-representation format Storable is actually using ... if that somehow needed to be “binary” in order to be correctly decoded for any-and-all data.

    “The data” is just a perfectly-boring hash of strings and numbers (that does not contain references).   As it happens, the text is UTF8.

    P.S. I just “stumbled upon” the MLDBM module ...

      Each value stored in an SQLite database (or manipulated by the database engine) has one of the following storage classes:

      NULL. The value is a NULL value.

      INTEGER. The value is a signed integer, stored in 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.

      REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number.

      TEXT. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE).

      BLOB. The value is a blob of data, stored exactly as it was input.

      That implies that SQLite performs some manipulation (encoding and/or decoding) upon fields designated as TEXT.

      Storable stores integers in binary, which means that some value, or some subset of the bytes making up that value might look like some unicode entity that is illegal or has some special meaning (like Byte Order Mark: 0xFFFE or 0xFEFF), that would cause the round trip into and out of the DB to change the content some how.

      I'd be very much inclined to using BLOB for Storable data.

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: SQLite and Storable: is "Text" okay?
by sundialsvc4 (Abbot) on Oct 06, 2010 at 17:59 UTC

    Okay, it sounds like the developing consensus here, based in particular on what BrowserUK has observed, is that BLOB would be the most appropriate data type.

    I, too, have read that “the storage format is based on the value itself,” and that notion has always given me pause...

    Well, in any case, I surmise that the fundamental data-representation used by Storable is “binary data,” not “a string.”   (This in direct contrast to other choices like YAML and JSON, which are designed with the HTTP protocol in mind.)   Consequently, BLOB would presumably give SQLite the most-direct advice that “what you should expect to find in this column is... ‘arbitrary binary data.’”   Hence, BLOB, and not TEXT, becomes the most appropriate choice.

    “Going once ... going twice ...”

      JSON::XS is as fast as Storable and is also human readable/editable. This can help with debugging; faster, easier, less error prone than writing new code to do the introspection/tweaking. Unless you're doing something special that needs more than JSON provides, I'd recommend it over Storable.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://863802]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2016-09-26 03:08 GMT
Find Nodes?
    Voting Booth?
    Extraterrestrials haven't visited the Earth yet because:

    Results (475 votes). Check out past polls.