Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Again, I agree and don't agree... The assumption that all strings are in one of the storage formats, unless explicitly specified otherwise, is a source of great confusion.

No idea what that means.

Perl's source code (without "use utf8")? Output of readdir? Contents of @ARGV?

Don't know. Don't care. Doesn't matter how they are stored, as those are internal details that aren't relevant.

What does matter is whether they returned decoded text or something else. That has nothing to do with the internal storage format.

To me, 'Perl thinks everything is in Latin-1, unless told otherwise' seems like a more useful, understandable explanation.

It's completely false — nothing in Perl accepts or produces latin-1 — and it has nothing to do with anything discussed so far.

If I actually do have Latin-1 (more realistically, ASCII) than it's not 'wrong', is that what you want to say?

You were complaining that Perl let you concatenate decoded text and UTF-8 bytes. (Well, you called it something different, but this is the underlying issue.) It has no idea one of the the strings you are concatenating contains text and that the other contains UTF-8 bytes, so it can't let you know that you are doing something wrong.

For example,

my $x = chr(0x2660); my $y = chr(0xC3).chr(0xA9); $x . $y;

This is all the information Perl currently has. Is that an error? You can't tell. Perl can't tell. Strings coming from a file handle with a decoding layer should be flagged "I'm decoded text!". Those coming from a file handle without a decoding layer should be flagged "I'm bytes!". Concatenating the two should be an error. These flags do not currently exist.

In reply to Re^6: Database vs XML output representation of two-byte UTF-8 character by ikegami
in thread Database vs XML output representation of two-byte UTF-8 character by jkeenan1

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others rifling through the Monastery: (7)
    As of 2021-04-12 17:09 GMT
    Find Nodes?
      Voting Booth?

      No recent polls found