Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

That means that you are sending UTF-8 to a browser that is expecting Latin-1. That is probably the most common Unicode problem and the "funny 'A' plus a garbage character" in place of some international letter is dead typical.

If it were just some layer in your DB connection or Perl "helpfully" converting to UTF-8 for you, then you are supposed to get a warning when you try to output this UTF-8 to your http-daemon because you haven't declared that this output (I/O) handle understands UTF-8. So that may mean that your problem is that you've "declared that the CGI output (I/O) handle is expecting UTF-8" (since you mentioned no warning).

More likely, your DB is giving UTF-8 strings to your Perl and nobody bothered to inform your Perl of this detail. So Perl doesn't know that its string of bytes is actually encoded as UTF-8 characters so Perl can't warn you but is still writing out the bytes of UTF-8-encoded characters (as opposed to knowing that it is writing out UTF-8 characters by writing out the bytes that they are made of).

Unicode was designed by people who had gotten used to the utopia of "everything is a byte stream" while not realizing that their creation was going to destroy that utopia so their plans were woefully inadequate. (I got to ride a small bit of the tail of the world before everybody just took for granted that everything was a byte stream.)

In this painful transition world (before we eventually arrive at the designed "everything is a Unicode stream with appropriate BOM or meta data regarding encoding" "utopia" (the term "my(t)opia" springs to mind, especially with regard to the prior paragraph), one often must be quite careful at every layer to ensure that both sides of that layer agree on the expected encoding. And the layers can be quite numerous.

You have an advantage in this case in that Perl adds the "is this Unicode?" metadata to its strings and (mostly) to its streams, so the odds are that the layer that is currently causing you problems is likely nearly outside of Perl, probably on the database side.

My first step would be to upgrade the DBD driver module and see if the problem just goes away. The most likely layers to cause problems are the ones where the authors on each side are the least well connected. Although the authors of a DBD module usually try pretty hard to stay well connected to both their database of choice and to Perl, you don't have to go very far back to find a version (of most DBDs) that isn't dealing with Unicode quite the way their database of choice currently does and/or isn't dealing with Unicode quite the way Perl currently does (Unicode support is still a relatively new concept that is still subject to significant "evolution").

- tye        

In reply to Re: OT? Character set issues with MySQL/CGI::Application (funny "A" + garbage) by tye
in thread OT? Character set issues with MySQL/CGI::Application by cLive ;-)

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others chilling in the Monastery: (4)
    As of 2018-04-19 18:23 GMT
    Find Nodes?
      Voting Booth?