note
ikegami
<p>You're confusing the internal representation (as reported by <c>is_utf8</c>) and the external one.
<c>
+-----------------------------------------------------------------+
| |
| Decoded Text |
| |
| |
| +--------------------+ downgrade +--------------------+ |
| | Internally encoded | --------------> | Internally encoded | |
| | as UTF-8 | | as iso-8859-1 | |
| | (is_utf8 = 1) | <-------------- | (is_utf8 = 0) | |
| +--------------------+ upgrade +--------------------+ |
| |
+-----------------------------------------------------------------+
| ^
| |
encode | | decode
| |
v |
+-----------------------------------------------------------------+
| |
| Bytes or |
| Encoded Text |
| |
| |
| +--------------------+ downgrade +--------------------+ |
| | Internally encoded | --------------> | Internally encoded | |
| | as UTF-8 | | as iso-8859-1 | |
| | (is_utf8 = 1) | <-------------- | (is_utf8 = 0) | |
| +--------------------+ upgrade +--------------------+ |
| |
+-----------------------------------------------------------------+
</c>
<p>
<ul>
<li><c>upgrade</c> refers to <c>utf8::upgrade</c> or an implicit upgrade.
<li><c>downgrade</c> refers to <c>utf8::downgrade</c>.
<li><c>decode</c> refers to <c>Encode::decode</c>, <c>utf8::decode</c>, <c>:encoding</c> on an input stream, etc.
<li><c>encode</c> refers to <c>Encode::encode</c>, <c>utf8::encode</c>, <c>:encoding</c> on an output stream, etc.
<li><c>is_utf8</c> refers to <c>Encode::is_utf8</c> or <c>utf8::is_utf8</c> (which return the value of the <c>UTF8</c> flag).
</ul>
<p>
<ul>
<li><c>utf8::upgrade</c> is safe to call on strings that are already upgraded.
<li><c>utf8::downgrade</c> is safe to call on strings that are already downgraded.
<li>It is a bug to encode a string that's already encoded.
<li>It is a bug to decode a string that's already decoded.
</ul>
<hr>
<p>Your code should be
<c>
use Encode qw(is_utf8 encode decode);
binmode STDOUT,':encoding(iso-8859-1)';
my $str = "This's a \x{201c}test\x{201d}"; # This is a "decoded" str.
print "$str\n"; # Encoded by :encoding
</c>
or
<c>
use Encode qw(is_utf8 encode decode);
my $str = "This's a \x{201c}test\x{201d}"; # This is a "decoded" str.
print encode('iso-8859-1', "$str\n");
</c>
<blockquote><p><i>Why, perl say that it's an utf8 but can't decode it?</i></blockquote>
<p>Perl said the <em>internal</em> encoding is UTF8. You shouldn't have care what the internal encoding is. (Unfortunately, you still need to know in some circumstances. This isn't one of those.)
<blockquote><p><i>if \x{201c} is not an utf8 character</i></blockquote>
<p>There's no such thing as a "utf8 character" or "UTF-8 character" since utf8 and UTF-8 aren't character sets. <c>\x{201c}</c> generates a <em>Unicode</em> character (U+201C, LEFT DOUBLE QUOTATION MARK) which can be encoded using UTF-8.
755156
755156