Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Problem with join'ing utf8 and non-utf8 strings (bug?)

by jbert (Priest)
on Jun 18, 2008 at 14:26 UTC ( #692716=note: print w/ replies, xml ) Need Help??


in reply to Problem with join'ing utf8 and non-utf8 strings (bug?)

In case it's not obvious from what other people have said above:

  • Perl is autoconverting your non-tagged string to utf8 for you. In doing so, it assumes it is already in an encoding (iso-latin-1). This assumption is what is at odds with your expectations (you're thinking of this data as a series of utf8 chars, rather than a series of latin-1 chars).
  • Everything should work out OK as long as you ensure the inputs+outputs to your program tag data appropriately. That is, look into 'binmode' to set the :utf8 flag on a filehandle, and/or the 'open' module listed above, and perhaps -Cio cmdline option.
  • Other sources of data can be a pain. e.g. stuff pulled from a db. There are ways around this (see mysql_enable_utf8 in DBD::mysql, and associated charset setttings on the db server side).
  • The thing to remember is that you don't want a mix of utf8 tagged and non-tagged data loose in your code. The best way to achieve this is to ensure that all data is tagged at the entry points.
  • Some CPAN modules just don't seem to play nicely with correctly-tagged utf8 data. (e.g. Template::Toolkit requires that you stick a byte-order-mark in your templates (ugh) rather than allowing you to tell it an encoding).


Comment on Re: Problem with join'ing utf8 and non-utf8 strings (bug?)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://692716]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2014-07-30 11:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (230 votes), past polls