Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: Problem with join'ing utf8 and non-utf8 strings (bug?)

by jbert (Priest)
on Jun 18, 2008 at 14:26 UTC ( #692716=note: print w/replies, xml ) Need Help??

in reply to Problem with join'ing utf8 and non-utf8 strings (bug?)

In case it's not obvious from what other people have said above:
  • Perl is autoconverting your non-tagged string to utf8 for you. In doing so, it assumes it is already in an encoding (iso-latin-1). This assumption is what is at odds with your expectations (you're thinking of this data as a series of utf8 chars, rather than a series of latin-1 chars).
  • Everything should work out OK as long as you ensure the inputs+outputs to your program tag data appropriately. That is, look into 'binmode' to set the :utf8 flag on a filehandle, and/or the 'open' module listed above, and perhaps -Cio cmdline option.
  • Other sources of data can be a pain. e.g. stuff pulled from a db. There are ways around this (see mysql_enable_utf8 in DBD::mysql, and associated charset setttings on the db server side).
  • The thing to remember is that you don't want a mix of utf8 tagged and non-tagged data loose in your code. The best way to achieve this is to ensure that all data is tagged at the entry points.
  • Some CPAN modules just don't seem to play nicely with correctly-tagged utf8 data. (e.g. Template::Toolkit requires that you stick a byte-order-mark in your templates (ugh) rather than allowing you to tell it an encoding).
  • Comment on Re: Problem with join'ing utf8 and non-utf8 strings (bug?)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://692716]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2018-03-24 09:02 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (297 votes). Check out past polls.