<?xml version="1.0" encoding="windows-1252"?>
<node id="812368" title="Help needed understanding unicode in perl" created="2009-12-11 07:58:48" updated="2009-12-11 07:58:48">
<type id="115">
perlquestion</type>
<author id="961">
Anonymous Monk</author>
<data>
<field name="doctext">
Hi&lt;br&gt;
I understand that a unicode string is a set of symbols and that utf-8 is a way of saving these symbols as a a set of bytes - a way of encoding unicode. I understand also that "use utf8;" tells perl to interpret the perl file being read as being encoded using utf8 and containing unicode symbols.&lt;br&gt;&lt;br&gt;

I'm using the method HTML::Strip() to extract text from a website. I then wish to print out this text in my terminal window. This terminal window is currently the console window in Eclipse on the Mac Os X Platform. &lt;br&gt;&lt;br&gt;

First question. If I call a function, how do I know if it returns a unicode string or not. How can I get HTML::Strip() to return a unicode string? How is a unicode string encoded (utf8 or 16 etc?)&lt;br&gt;&lt;br&gt;

Second question. If I have a unicode string, how do I output it to my console window so that it appears correctly? Am I right in saying that my console window has its own encoding and that I must probably convert from utf8 (if thats how perl saves unicode) to my console windows encoding? How do I know what encoding my console window uses? If my console uses latin-1, then I will need to reencode the unicode string from utf-8 to latin-1, where all symbols outside those defined for latin-1 become a "?", right?&lt;br&gt;&lt;br&gt;
Thanks for your advice!

 </field>
<field name="reputation">
11</field>
</data>
</node>
