Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

This looks to me like a fundamental misunderstanding of what encoding is, and what encodings exist, and maybe more on this topic as well.

An encoding is just a way to map numbers (whether one byte or more) to glyphs, such as mapping the number 97 to the glyph "a".

Different encodings have different mappings. Not counting unicode encodings (UTF-8, UTF-16, UTF-32, etc., and, yes, there are more) some glyphs appear in more than one encoding, some glyphs appear in different places in different encodings, some glyphs occur in the same place in some encodings (but different in others), some glyphs occur in the same place in every encoding they appear in, and some glyphs appear in the same place in all encodings.

And some glyphs appear in the same place in all encodings and the same place in unicode encodings (possibly with the exception of UTF-7). And that is likely where we are right here.

If you compare the glyphs and their code points for all ordinals under 128 in ISO-88591 against those same code points in UTF-8, you will find that they are bit-for-bit identical. That is, there is no actual way to tell that a UTF-8 file that only uses the code points under 128 as found in ISO-88591 is not actually ISO-88591. Whether you treat it as ISO-88591 or as UTF-8, it doesn't change anything.

So, when you convert from one to the other, you can do so with the "copy" ("cp") command.

(See the conversation in one of my recent threads for another example along the same confusion.)

Your starting file already is UTF-8. If the "file" command can't tell them apart, that's because there is no telling them apart. However, as html, the file command may also use extra heuristics, such as looking for meta tags. So when you change the meta tags, you change the output of file. I don't know if the meta tag was different from the actual encoding if someone would complain, other than your users.

In reply to Re: Why won't Perl convert (Latin1 | ISO-8859-1) to (UTF-8 | utf8)? by Tanktalus
in thread Why won't Perl convert (Latin1 | ISO-8859-1) to (UTF-8 | utf8)? by taint

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others contemplating the Monastery: (4)
    As of 2018-04-26 00:52 GMT
    Find Nodes?
      Voting Booth?