http://www.perlmonks.org?node_id=916365


in reply to Re^2: JSON, UTF-8 and Filehandles
in thread JSON, UTF-8 and Filehandles

I wrote the first reply in about 10 minutes and had to leave. Let me be more clear about the is_utf8 flag check. Don't use is_utf8. is_utf8 checks if a string is internally encoded in utf8. Deep inside the angry bowels of perl! Using is_utf8 is fraught with peril, which is unfortunate for such a seemingly easy function, right? It doesn't do what you think it does.

You didn't read the unicode docs did you? Here is a great link: http://perldoc.perl.org/perlunifaq.html#What-is-%22the-UTF8-flag%22%3f. There is also perluniintro, perlunicode, utf8 etc. Feel free to continue screwing yourself by not reading these. Don't forget to not read the link I gave in my first reply.

Now I have time to reply to your bullets:

In response to your latest reply: Stop worrying about the utf8 flag and just worry about encoding once and decoding once. Don't encode with JSON if you are encoding to utf8 before writing to the file. Vice-versa with decode. That's all you need to worry about. Remember, this also applies to STDOUT.

The wide characters are probably mangled because you are using a utf8 string constant.