in reply to Re^5: any use of 'use locale'? (source encoding)
in thread any use of 'use locale'?
You can't output characters to STDOUT without instructing Perl how to convert those characters into bytes. I provided the fix for that at the bottom of my previous post.
I don't think myself being a master in this topic and in monastery here i have seen how deeply you handle unicode area. So, forgive me some inaccuracy, like switching unicode and utf8, and maybe others. I try to give here a picture, which is grown years. So, the situation nowadays, in most of my scripts i have at least such block:
use utf8; use open ':std' => ':encoding(UTF-8)'; use locale;
While testing a possibility to cover every possible hole i ended on something like this:
use utf8; use CGI qw( -utf8 ); use open IO => ':utf8'; use open ':std' => ':encoding(UTF-8)'; binmode STDIN, ":utf8"; binmode STDOUT, ":utf8"; my $dbh = DBI->connect( "dbi:mysql:dbname=$db;host=localhost", $user, +$pwd, {mysql_enable_utf8 => 1 });
Thruth is, i am not sure, which lines above are dubbed ;) and there above is still not covered areas, i think. ENV and ARGV and file names are still not proof?
What i am looking for? Instead of reading tons of documentation to put together whole this (ugly) picture above, i'd like to have in perluniintro saying something like this:
If you have properly set up UTF-8 system, you can just say use utf8_everywhere; # because "use utf8" means something narrower and relax. If your situation needs something more complex, continue reading...
Or other way, which would be logical to me:
use locale;I understand there are pieces i don't see. But how could "use locale" break a code which does not use it? Or if code use it, what they mean with "use locale", if they don't want to really use locale?
You say:
Your locale is not used to determine the encoding of the source file.
I answer: how sad! Why i then define the locale in my system and ask Perl to use it?
You can't output characters to STDOUT without instructing Perl how to convert those characters into bytes.
I answer: but i did! If i have properly defined system locale and i ask Perl to use it, then Perl should know, how to convert characters. Or what i am missing here?
I'd like to have possibility easily define a scope where everything is treated as utf8. If i say "use locale", then i mean: spread my locale to my code, whatever this locale is. So, any info in this scope is treated as locale needs. People who needs \d == /a-zA-Z0-9_/ don't have to use locale-pragma or even such locale, which defines otherwise. But IMHO, where is needed different approach, it would be easy to adapt.
I understand, it is wider problem. But for now there is for developers nothing to rely on. For example, for CGI i have explicitly say, i need UTF8. For DBI same. And so on. Why? Because there is no standard place they could look automagically for it, AFAIU. If there would be one big "use utf8_everywhere", which hoist a big flag, every module author could rely on it. Or?
Such a naive picture i have. I'd like to see weak places in this. I hope, i answered most rised questions, but to be clear:
Why should Perl only recognize Unicode.
Not only. But if i ask to use unicode, it should. Simply and anywhere.
What about locales? You yourself want it to recognize locales.
Yes, i want. And i don't see contradiction. Whatever coding locale uses, "use locale" should in its scope use it also.
What about POSIX?
Sorry, it is over my head.
What about what backwards compatibility?
There is my weak point: i don't see, how could it break something. And i just don't see, i understand it may. That is, why i "made" new pragma "utf8_everywhere"
What about 99% of the people who use \d and \w to mean /0-9/ and /a-zA-Z0-9_/?
This seems to me simple: they a) don't "use locale" or b) "use locale" with proper system locale. Other uses seems to me buggy.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^7: any use of 'use locale'? (source encoding)
by afoken (Chancellor) on Nov 21, 2009 at 22:09 UTC | |
by wanradt (Scribe) on Nov 22, 2009 at 22:46 UTC | |
by afoken (Chancellor) on Nov 23, 2009 at 16:19 UTC | |
by wanradt (Scribe) on Nov 25, 2009 at 19:17 UTC | |
by afoken (Chancellor) on Nov 29, 2009 at 00:59 UTC | |
| |
Re^7: any use of 'use locale'? (source encoding)
by zwon (Abbot) on Nov 23, 2009 at 13:40 UTC | |
by ikegami (Patriarch) on Nov 23, 2009 at 17:54 UTC |