http://www.perlmonks.org?node_id=11160571

jmClifford has asked for the wisdom of the Perl Monks concerning the following question:

Hi. I have an SQLite DB loaded from the web. I can read this with SQLite Expert Personal (5.4). On viewing a cell within the DB, I have the following observations;

Column Name; inits_imperial Value (SQLite Expert); '°F' Value as Perl has on console; '°F'

The above is with an original cell value that was downloaded (from the web). However, I can with the same Perl program create another row (with an Insert operation, $insert->execute('998', '°C', '°C', '°F'); ), where I get the expect behavior as below;

Column Name; inits_imperial Value (SQLite Expert); '°F' Value as Perl has on console; '°F'

The extra character from the former (and original cell) is Â, which I confirm is UTF-8 hex C3 82 (with Notepad++ hex presentation, after cut and paste).

It appears the only way to rid my SQLite DB of these characters in the Perl console presentation (IDE is Eclipse) is to delete and insert the whole row again. Editing with SQLite Expect does not work. Can anyone shed light on this situation ??

Regards JC......

Replies are listed 'Best First'.
Re: Different presentation by SQLite
by Corion (Patriarch) on Jul 13, 2024 at 06:40 UTC

    You don't show us what Perl code you used to insert the degree sign into the database. Most likely this brings us back to your original thread, in which some questions to your code are still open.

    My personal recommendation is to avoid Unicode / non-ascii in your source code and use the \N{} notation:

    use charnames ':full'; my $fahrenheit = "\N{DEGREE SIGN}F"; # ... use $fahrenheit where you otherwise would type °F

    The difference between SQLite Expert, Perl and the console output still comes out to that the strings you display with  are undecoded UTF-8 strings. This can have multiple causes, as discussed in SQLite returns funny characters.

    If you have mixed string in your table, you will have to run an UPDATE SQL command to fix your database. But before you do that, make sure that your code is consistent.

Re: Different presentation by SQLite
by LanX (Saint) on Jul 13, 2024 at 11:30 UTC
    That's a repost/continuation of SQLite returns funny characters?

    Update

    When dealing with UTF8 various interfaces can fail.

    In your case I'm suspecting the console not to be configured to display UTF8.

    You should narrow down the problem.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

      LanX is right. In your original post, I suggested (if you are on Windows-you did not tell us your platform) that you change the code page to UTF for the output to display properly, as follows: chcp 65001

      "It's not how hard you work, it's how much you get done."

      Hi. Thanks for the replies. The original question is a little out of order. I am suggesting that the SQLite DB from the web has the  character, which I have difficulty removing with the SQLite Expert interface. The only way I get success is with writing/inserting a new row (maybe an update when I get to test this as well).

      Thanks for the Character Name convention. I did not know this.

      I am quite sure the Eclipse console is UTF-8. My way of knowing this is the °C is displayed as C2B043 with Notepad++ (Hex mode after a cut and paste), and conforms with what I have now read about UTF-8.

      Regards JC......

        Rather than printing it, which really doesn't tell you much (looking correct might even mean that it's wrong!), try dumping the character numbers. Also check what perl thinks is the "length".
        my $x= "\x{B0}F"; say map sprintf("%02X ",ord), split //, $x;

        One problem is that the degree sign 0xB0 is within the lower 0xFF of unicode, so perl can represent it in both ascii form and in utf-8, and this can make it extra confusing to track down the problem.

        Some pointers that might help the debugging:

        • According to docs, SQLite *always* uses unicode, so it shouldn't be possible to have the raw \xB0 byte stored in a column. You can rule that out.
        • If perl's database interface were configured incorrectly, reading the unicode character \xB0 (which sqlite should encode as \C2\B0) would arrive in perl as two characters. You can find out if this is the case using "length", or by hex-dumping the characters as shown above.
        • It's perfectly possible for someone to take utf-8 bytes and tell SQLite its a string of unicode characters, and end up with \xC2 and \xB0 stored as two characters (encoded as 4 utf-8 bytes). I would refer to this situation as being "double-encoded".
        • You can repair double-encoded data using perl's utf8::decode($x). Note, that decodes the string in-place, rather than returning the decoded value. It is *almost* always safe to call this on a string whenever you're in doubt. It is unlikely that any real text would contain two characters that could be mistaken for a utf-8 sequence. This is my go-to whenever I have partly corrupted data after an encoding mistake was deployed to production and polluted the database with some double-encoded data.
        • You can only trust "print" to show you encoding problems if perl's STDOUT has the :utf8 layer applied and if your terminal is strictly UTF-8. If perl does not have the encoding layer, there's a chance it will emit valid UTF-8 anyway, and the terminal won't see anything wrong. I emphasize chance here, because \xB0 is within the single-byte range, and perl may or may not have used an internal UTF-8 encoding for the string. There's also the chance that a terminal has "helpful" support for programs that emit bytes, and silently upgrades it to unicode; I don't know anything specifically about Eclipse's terminal, but I would be cautious about trusting it to reveal encoding errors.