|Perl: the Markov chain saw|
Re^6: any use of 'use locale'? (source encoding)by wanradt (Scribe)
|on Nov 21, 2009 at 04:28 UTC||Need Help??|
You can't output characters to STDOUT without instructing Perl how to convert those characters into bytes. I provided the fix for that at the bottom of my previous post.
I don't think myself being a master in this topic and in monastery here i have seen how deeply you handle unicode area. So, forgive me some inaccuracy, like switching unicode and utf8, and maybe others. I try to give here a picture, which is grown years. So, the situation nowadays, in most of my scripts i have at least such block:
While testing a possibility to cover every possible hole i ended on something like this:
Thruth is, i am not sure, which lines above are dubbed ;) and there above is still not covered areas, i think. ENV and ARGV and file names are still not proof?
What i am looking for? Instead of reading tons of documentation to put together whole this (ugly) picture above, i'd like to have in perluniintro saying something like this:
Or other way, which would be logical to me:use locale;
I understand there are pieces i don't see. But how could "use locale" break a code which does not use it? Or if code use it, what they mean with "use locale", if they don't want to really use locale?
Your locale is not used to determine the encoding of the source file.
I answer: how sad! Why i then define the locale in my system and ask Perl to use it?
You can't output characters to STDOUT without instructing Perl how to convert those characters into bytes.
I answer: but i did! If i have properly defined system locale and i ask Perl to use it, then Perl should know, how to convert characters. Or what i am missing here?
I'd like to have possibility easily define a scope where everything is treated as utf8. If i say "use locale", then i mean: spread my locale to my code, whatever this locale is. So, any info in this scope is treated as locale needs. People who needs \d == /a-zA-Z0-9_/ don't have to use locale-pragma or even such locale, which defines otherwise. But IMHO, where is needed different approach, it would be easy to adapt.
I understand, it is wider problem. But for now there is for developers nothing to rely on. For example, for CGI i have explicitly say, i need UTF8. For DBI same. And so on. Why? Because there is no standard place they could look automagically for it, AFAIU. If there would be one big "use utf8_everywhere", which hoist a big flag, every module author could rely on it. Or?
Such a naive picture i have. I'd like to see weak places in this. I hope, i answered most rised questions, but to be clear:
Why should Perl only recognize Unicode.
Not only. But if i ask to use unicode, it should. Simply and anywhere.
What about locales? You yourself want it to recognize locales.
Yes, i want. And i don't see contradiction. Whatever coding locale uses, "use locale" should in its scope use it also.
What about POSIX?
Sorry, it is over my head.
What about what backwards compatibility?
There is my weak point: i don't see, how could it break something. And i just don't see, i understand it may. That is, why i "made" new pragma "utf8_everywhere"
What about 99% of the people who use \d and \w to mean /0-9/ and /a-zA-Z0-9_/?
This seems to me simple: they a) don't "use locale" or b) "use locale" with proper system locale. Other uses seems to me buggy.