Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Pragma to handle unicode characters

by almut (Canon)
on Dec 22, 2008 at 01:15 UTC ( [id://731948]=note: print w/replies, xml ) Need Help??


in reply to Pragma to handle unicode characters

(...) what makes so hard to implement real unicode pragma?

I don't have an answer to your question, but I'd just like to point out another issue that you haven't even touched on: the handling of file names (such as sämple.txt)...

Replies are listed 'Best First'.
Re^2: Pragma to handle unicode characters
by monarch (Priest) on Dec 22, 2008 at 03:37 UTC

    How simple do you want encoding/decoding? Would you like Perl to "automagically" encode/decode JSON? ASN.1? Why, specifically, do you demand it of UTF-8?

    The truth is that UTF-8 is a variable-length character encoding method. It's probably a good thing that you have to explicitly decode inputs and encode outputs.. it forces you to know what you are doing.

    Update: an example: e-mail - yes, you can send e-mails as UTF-8! But were you aware that MIME headers must be in a 7-bit encoding? In this case blindly opening a socket and telling it to encode all UTF-8 output will severely break your application. It is much better to know specifically when and where encoding is appropriate and permissible..

      Would you like Perl to "automagically" encode/decode JSON? ASN.1? Why, specifically, do you demand it of UTF-8?

      It's a matter of convenience, primarily — and in some cases, transparency (such as having a single point of configuration where the encoding can be switched, rather than requiring every piece of code to take care of it on its own).

      The comparison to JSON or ASN.1 seems somewhat far-fetched to me. Unicode is envisaged - and I think widely accepted - to eventually become the successor of legacy character encodings such as Latin-1, with their well known limits. And, among the Unicode encodings, UTF-8 would presumably be a good choice to be used as the default (because it was specifically designed with backwards compatibility in mind). In contrast, JSON / ASN.1 are rather special purpose (and typically not used as character encodings), so I don't currently see any need to have similar built-in support for them in Perl.

      The truth is that UTF-8 is a variable-length character encoding method. It's probably a good thing that you have to explicitly decode inputs and encode outputs.. it forces you to know what you are doing.

      Equally (with a hypothetical pure ASCII mind set in place) you could say: "The truth is that Latin-1 is a (specific) 8-bit character encoding method. It's probably a good thing that you have to explicitly decode inputs and encode outputs.. it forces you to know what you are doing." — Still, we do have Latin-1 semantics by default in Perl...

      Just because UTF-8 is variable length doesn't mean it wouldn't be a sensible choice in environments that otherwise make use of it, in particular when the programmer explicitly requests that very functionality using a pragma.

      (...) MIME headers must be in a 7-bit encoding

      The current 8-bit default for IO could cause just as much potential breakage as UTF-8 would in this case. I don't think that particular limits which apply to certain content (or parts thereof) is a good argument against generally providing a way to conveniently say "I want UTF-8 to be used as default for all strings/content" (which is what I think the OP had in mind).  Special cases can be dealt with in the application code. As things are now, UTF-8 (or, more generally, anything non-Latin-1) is still too often the "special case", rather than a (configurable!) global default.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://731948]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2024-04-26 03:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found