|Perl: the Markov chain saw|
Help needed understanding unicode in perlby Anonymous Monk
|on Dec 11, 2009 at 12:58 UTC||Need Help??|
Anonymous Monk has asked for the
wisdom of the Perl Monks concerning the following question:
I understand that a unicode string is a set of symbols and that utf-8 is a way of saving these symbols as a a set of bytes - a way of encoding unicode. I understand also that "use utf8;" tells perl to interpret the perl file being read as being encoded using utf8 and containing unicode symbols.
I'm using the method HTML::Strip() to extract text from a website. I then wish to print out this text in my terminal window. This terminal window is currently the console window in Eclipse on the Mac Os X Platform.
First question. If I call a function, how do I know if it returns a unicode string or not. How can I get HTML::Strip() to return a unicode string? How is a unicode string encoded (utf8 or 16 etc?)
Second question. If I have a unicode string, how do I output it to my console window so that it appears correctly? Am I right in saying that my console window has its own encoding and that I must probably convert from utf8 (if thats how perl saves unicode) to my console windows encoding? How do I know what encoding my console window uses? If my console uses latin-1, then I will need to reencode the unicode string from utf-8 to latin-1, where all symbols outside those defined for latin-1 become a "?", right?
Thanks for your advice!