note
g00n
<p>reading through the pod source files (like on does when developing) I came across this in <a href="http://www.perldoc.com/perl5.8.0/pod/perlpodspec.html#Notes-on-Implementing-Pod-Processors"><b>perlpodspec.pod</b></a>.
I've included the text verbatim from the link as it highlights (I think), insight into the problem. It reads ...
<p>
<ul>
<em>
<p>
Since Perl recognizes a Unicode Byte Order Mark at the start of files
as signaling that the file is Unicode encoded as in UTF-16 (whether
big-endian or little-endian) or UTF-8, Pod parsers should do the
same.</p> <p>Otherwise, the character encoding should be understood as
being UTF-8 if the first highbit byte sequence in the file seems
valid as a UTF-8 sequence, or otherwise as Latin-1 ...</p>
<p>... A naive but sufficient heuristic for testing the first highbit
byte-sequence in a BOM-less file (whether in code or in Pod!), to see
whether that sequence is valid as UTF-8 (RFC 2279) is to check whether
that the first byte in the sequence is in the range 0xC0 - 0xFD
I whether the next byte is in the range
0x80 - 0xBF. If so, the parser may conclude that this file is in
UTF-8, and all highbit sequences in the file should be assumed to
be UTF-8. </p><p> Otherwise the parser should treat the file as being
in Latin-1. In the unlikely circumstance that the first highbit
sequence in a truly non-UTF-8 file happens to appear to be UTF-8, one
can cater to our heuristic (as well as any more intelligent heuristic)
by prefacing that line with a comment line containing a highbit
sequence that is clearly I valid as UTF-8.</p><p> A line consisting
of simply "#", an e-acute, and any non-highbit byte,
is sufficient to establish this file's encoding.
</p></em>
</ul>
<p>from this you should be able to work out UTF-8/Latin-1.</p>
322988
322988