Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re: Problem with join'ing utf8 and non-utf8 strings (bug?)

by jbert (Priest)
on Jun 18, 2008 at 14:26 UTC ( #692716=note: print w/replies, xml ) Need Help??

in reply to Problem with join'ing utf8 and non-utf8 strings (bug?)

In case it's not obvious from what other people have said above:
  • Perl is autoconverting your non-tagged string to utf8 for you. In doing so, it assumes it is already in an encoding (iso-latin-1). This assumption is what is at odds with your expectations (you're thinking of this data as a series of utf8 chars, rather than a series of latin-1 chars).
  • Everything should work out OK as long as you ensure the inputs+outputs to your program tag data appropriately. That is, look into 'binmode' to set the :utf8 flag on a filehandle, and/or the 'open' module listed above, and perhaps -Cio cmdline option.
  • Other sources of data can be a pain. e.g. stuff pulled from a db. There are ways around this (see mysql_enable_utf8 in DBD::mysql, and associated charset setttings on the db server side).
  • The thing to remember is that you don't want a mix of utf8 tagged and non-tagged data loose in your code. The best way to achieve this is to ensure that all data is tagged at the entry points.
  • Some CPAN modules just don't seem to play nicely with correctly-tagged utf8 data. (e.g. Template::Toolkit requires that you stick a byte-order-mark in your templates (ugh) rather than allowing you to tell it an encoding).
  • Comment on Re: Problem with join'ing utf8 and non-utf8 strings (bug?)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://692716]
[Corion]: Hurr. Yesterday I played around with ffmpeg as a new toy and found its "scene" filter great - it detects scene changes. Now I could write a module that splits a given video on its cuts into different scenes. Except I have no use case for that :)
[Corion]: (and also, writing yet another FFmpeg module just to wrap system() and grep through its output isn't all that great ...)
[erix]: cut out advertisements from movies? :)
[erix]: robably not possible (or it would have been done already)

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (11)
As of 2018-05-24 11:14 GMT
Find Nodes?
    Voting Booth?