XP is just a number | |
PerlMonks |
Re^2: HTML::Entities and Unicode quotesby tod222 (Pilgrim) |
on Aug 22, 2011 at 06:23 UTC ( [id://921582]=note: print w/replies, xml ) | Need Help?? |
Thank you for this excellent response. I found it quite illuminating. Before posting I'd spent about 30 minutes reading perlunifaq and searching here on Perlmonks without things getting much clearer. In fact, some of what I read here was a bit disconcerting; the complaints that Perl no longer 'just worked' seemed apropos. One source of my original confusion was that I had a file containing \xe2\x80\x9c and \xe2\x80\x9d sequences when examined using 'od -t x1 foo2' which would display correctly on Ubuntu with 'cat' in gterm. Since the Unicode table I linked showed that the sequences were valid representations of “ and ” I wondered why HTML::Entities wasn't handling it correctly, particularly when cat could. Thanks for pointing out Encode::is_utf8($str), as I'd been wondering if there was something like this. A couple of things are still puzzling me, though. One is, the \xe2\x80\x9d sequence is in an encoding. What's it called? The other is that I'd like for Perl to 'just work' to whatever extent possible. Is there something that can be set at the start of a script to have all Perl IO default to ":encoding(UTF-8)"?
In Section
Seekers of Perl Wisdom
|
|