Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: Different presentation by SQLite

by jmClifford (Acolyte)
on Jul 13, 2024 at 14:24 UTC ( [id://11160582]=note: print w/replies, xml ) Need Help??


in reply to Re: Different presentation by SQLite
in thread Different presentation by SQLite

Hi. Thanks for the replies. The original question is a little out of order. I am suggesting that the SQLite DB from the web has the  character, which I have difficulty removing with the SQLite Expert interface. The only way I get success is with writing/inserting a new row (maybe an update when I get to test this as well).

Thanks for the Character Name convention. I did not know this.

I am quite sure the Eclipse console is UTF-8. My way of knowing this is the °C is displayed as C2B043 with Notepad++ (Hex mode after a cut and paste), and conforms with what I have now read about UTF-8.

Regards JC......

Replies are listed 'Best First'.
Re^3: Different presentation by SQLite
by NERDVANA (Curate) on Jul 13, 2024 at 20:12 UTC
    Rather than printing it, which really doesn't tell you much (looking correct might even mean that it's wrong!), try dumping the character numbers. Also check what perl thinks is the "length".
    my $x= "\x{B0}F"; say map sprintf("%02X ",ord), split //, $x;

    One problem is that the degree sign 0xB0 is within the lower 0xFF of unicode, so perl can represent it in both ascii form and in utf-8, and this can make it extra confusing to track down the problem.

    Some pointers that might help the debugging:

    • According to docs, SQLite *always* uses unicode, so it shouldn't be possible to have the raw \xB0 byte stored in a column. You can rule that out.
    • If perl's database interface were configured incorrectly, reading the unicode character \xB0 (which sqlite should encode as \C2\B0) would arrive in perl as two characters. You can find out if this is the case using "length", or by hex-dumping the characters as shown above.
    • It's perfectly possible for someone to take utf-8 bytes and tell SQLite its a string of unicode characters, and end up with \xC2 and \xB0 stored as two characters (encoded as 4 utf-8 bytes). I would refer to this situation as being "double-encoded".
    • You can repair double-encoded data using perl's utf8::decode($x). Note, that decodes the string in-place, rather than returning the decoded value. It is *almost* always safe to call this on a string whenever you're in doubt. It is unlikely that any real text would contain two characters that could be mistaken for a utf-8 sequence. This is my go-to whenever I have partly corrupted data after an encoding mistake was deployed to production and polluted the database with some double-encoded data.
    • You can only trust "print" to show you encoding problems if perl's STDOUT has the :utf8 layer applied and if your terminal is strictly UTF-8. If perl does not have the encoding layer, there's a chance it will emit valid UTF-8 anyway, and the terminal won't see anything wrong. I emphasize chance here, because \xB0 is within the single-byte range, and perl may or may not have used an internal UTF-8 encoding for the string. There's also the chance that a terminal has "helpful" support for programs that emit bytes, and silently upgrades it to unicode; I don't know anything specifically about Eclipse's terminal, but I would be cautious about trusting it to reveal encoding errors.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11160582]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (2)
As of 2024-09-10 23:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The PerlMonks site front end has:





    Results (9 votes). Check out past polls.

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.