Wow, this is completely wrong.
No, not completely. More importantly, it's a useful way to think about the problem.
Awful names, and they have nothing to do with binary or Unicode.
They are respectively strings of 8-bit chars and string of 72-bit chars.
Why, 'Unicode' is not an awful name. It's irrelevant that Perl's UTF-8 allows bigger codepoints than the Unicode Consortium defines. 'Binary' is maybe an awful name, but what's more awful is silent conversion from '8-bit chars' to UTF-8, or back.
No, the problem is that you told it to encode text that was already encoded. It has nothing to do with the internal string formats.
No, the problem is that mister Keenan, who is an experienced Perl programmer with quite a few modules on CPAN (pardon me if I got that wrong), appears to be confused about Perl's behaviour. It has everything to do with the way Perl works.
No, you created garbage by concatenating UTF-8 and text. It has nothing to do with the internal string formats.
No, perl the computer program created garbage, because of the way it works. What does that even mean 'concatenating UTF-8 and text'? Why doesn't that actually work? (you know why). Why can't Perl warn me that I'm doing something stupid? (you know why)
When you do $number + $letters, Perl doesn't mangle anything; you did.
When you do $text + $utf8, Perl doesn't mangle anything; you did.
But when I did that unreasonable thing Perl didn't try to help me (like it tries to help when I do something like "1 + 'x'" ("argument isn't numeric...")). Yet here we have no warnings, no nothing. So it's not an error in Perl to do something stupid like $text + $utf8, IT'S SUPPOSED TO WORK LIKE THAT. And you know it. So yes, I can say that Perl mangled the strings, because this is the way it's intended to work.
Just like you wouldn't insert text into SQL without conversion, insert text into HTML without conversion, or insert text into a command line without conversion; all you have to do is not insert text into UTF-8 (or vice-versa) without conversion.
You know, Ikegami, it's true and not true. I actually know how to use Perl. But Perl provides absolutely no guidance towards that. And...
Decode inputs. Encode inputs.
Yes, yes. And how many Perl programs in the wild (or even on CPAN) actually do that? I'd say very few. Do you disagree? I'd even say most Perl programmers actually rarely need to do any encoding/decoding. Do you disagree?
It doesn't take an American
Perl works just fine when all you have is ASCII (or Latin-1). If you don't have ASCII/Latin-1... are names of files and directories binary or Unicode? (call it what you will). What about command-line parameters? Do I have to decode them? (yes). Ok, why "...or die $!;" produces garbage? Or right, strerror returned something that is not ASCII/Latin-1 (and I heard some of the porters want to make Perl speak only English, arguing that English is better than mojibake). I'd say it's pretty confusing for your Perl average programmer, let's keep things in perspective, Perl was never supposed to be something hardcore like C++.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||