Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Rather than printing it, which really doesn't tell you much (looking correct might even mean that it's wrong!), try dumping the character numbers. Also check what perl thinks is the "length".
my $x= "\x{B0}F"; say map sprintf("%02X ",ord), split //, $x;

One problem is that the degree sign 0xB0 is within the lower 0xFF of unicode, so perl can represent it in both ascii form and in utf-8, and this can make it extra confusing to track down the problem.

Some pointers that might help the debugging:

  • According to docs, SQLite *always* uses unicode, so it shouldn't be possible to have the raw \xB0 byte stored in a column. You can rule that out.
  • If perl's database interface were configured incorrectly, reading the unicode character \xB0 (which sqlite should encode as \C2\B0) would arrive in perl as two characters. You can find out if this is the case using "length", or by hex-dumping the characters as shown above.
  • It's perfectly possible for someone to take utf-8 bytes and tell SQLite its a string of unicode characters, and end up with \xC2 and \xB0 stored as two characters (encoded as 4 utf-8 bytes). I would refer to this situation as being "double-encoded".
  • You can repair double-encoded data using perl's utf8::decode($x). Note, that decodes the string in-place, rather than returning the decoded value. It is *almost* always safe to call this on a string whenever you're in doubt. It is unlikely that any real text would contain two characters that could be mistaken for a utf-8 sequence. This is my go-to whenever I have partly corrupted data after an encoding mistake was deployed to production and polluted the database with some double-encoded data.
  • You can only trust "print" to show you encoding problems if perl's STDOUT has the :utf8 layer applied and if your terminal is strictly UTF-8. If perl does not have the encoding layer, there's a chance it will emit valid UTF-8 anyway, and the terminal won't see anything wrong. I emphasize chance here, because \xB0 is within the single-byte range, and perl may or may not have used an internal UTF-8 encoding for the string. There's also the chance that a terminal has "helpful" support for programs that emit bytes, and silently upgrades it to unicode; I don't know anything specifically about Eclipse's terminal, but I would be cautious about trusting it to reveal encoding errors.

In reply to Re^3: Different presentation by SQLite by NERDVANA
in thread Different presentation by SQLite by jmClifford

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this?Last hourOther CB clients
    Other Users?
    Others contemplating the Monastery: (5)
    As of 2024-09-09 14:14 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found

      Notices?
      erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.