Again, I agree and don't agree... The assumption that all strings are in one of the storage formats, unless explicitly specified otherwise, is a source of great confusion.
No idea what that means.
Perl's source code (without "use utf8")? Output of readdir? Contents of @ARGV?
Don't know. Don't care. Doesn't matter how they are stored, as those are internal details that aren't relevant.
What does matter is whether they returned decoded text or something else. That has nothing to do with the internal storage format.
To me, 'Perl thinks everything is in Latin-1, unless told otherwise' seems like a more useful, understandable explanation.
It's completely false — nothing in Perl accepts or produces latin-1 — and it has nothing to do with anything discussed so far.
If I actually do have Latin-1 (more realistically, ASCII) than it's not 'wrong', is that what you want to say?
You were complaining that Perl let you concatenate decoded text and UTF-8 bytes. (Well, you called it something different, but this is the underlying issue.) It has no idea one of the the strings you are concatenating contains text and that the other contains UTF-8 bytes, so it can't let you know that you are doing something wrong.
my $x = chr(0x2660);
my $y = chr(0xC3).chr(0xA9);
$x . $y;
This is all the information Perl currently has. Is that an error? You can't tell. Perl can't tell. Strings coming from a file handle with a decoding layer should be flagged "I'm decoded text!". Those coming from a file handle without a decoding layer should be flagged "I'm bytes!". Concatenating the two should be an error. These flags do not currently exist.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||