in reply to
Problem with join'ing utf8 and non-utf8 strings (bug?)
In case it's not obvious from what other people have said above:
- Perl is autoconverting your non-tagged string to utf8 for you. In doing so, it assumes it is already in an encoding (iso-latin-1). This assumption is what is at odds with your expectations (you're thinking of this data as a series of utf8 chars, rather than a series of latin-1 chars).
- Everything should work out OK as long as you ensure the inputs+outputs to your program tag data appropriately. That is, look into 'binmode' to set the :utf8 flag on a filehandle, and/or the 'open' module listed above, and perhaps -Cio cmdline option.
- Other sources of data can be a pain. e.g. stuff pulled from a db. There are ways around this (see mysql_enable_utf8 in DBD::mysql, and associated charset setttings on the db server side).
- The thing to remember is that you don't want a mix of utf8 tagged and non-tagged data loose in your code. The best way to achieve this is to ensure that all data is tagged at the entry points.
- Some CPAN modules just don't seem to play nicely with correctly-tagged utf8 data. (e.g. Template::Toolkit requires that you stick a byte-order-mark in your templates (ugh) rather than allowing you to tell it an encoding).